This module provides commonly used codecs to parse RSS-related standard
formats.
-
class libearth.codecs.Boolean(true='true', false='false', default_value=None)
Codec to interpret boolean representation in strings e.g. 'true',
'no', and encode bool values back to string.
Parameters: |
- true (str, tuple) – text to parse as True. 'true' by default
- false (str, tuple) – text to parse as False. 'false' by default
- default_value (bool) – default value when it cannot be parsed
|
-
class libearth.codecs.Enum(values)
Codec that accepts only predefined fixed types of values:
gender = Enum(['male', 'female'])
Actually it doesn’t any encoding nor decoding, but it simply validates
all values from XML and Python both.
Note that values have to consist of only strings.
Parameters: | values (collections.Iterable) – any iterable that yields all possible values |
-
class libearth.codecs.Integer
Codec to encode and decode integer numbers.
-
class libearth.codecs.Rfc3339(prefer_utc=False)
Codec to store datetime.datetime values to RFC 3339
format.
Parameters: | prefer_utc (bool) – normalize all timezones to UTC.
False by default |
-
PATTERN = <_sre.SRE_Pattern object at 0xf3f1d0>
(re.RegexObject) The regular expression pattern that
matches to valid RFC 3339 date time string.
-
class libearth.codecs.Rfc822(microseconds=False)
Codec to encode/decode datetime.datetime values to/from
RFC 822 format.
Parameters: | microseconds (bool) – whether to preserve and parse microseconds as well.
False by default since it’s not standard
compliant |
New in version 0.3.0: Added microseconds option.
This module provides several subtle things to support
multiple Python versions (2.6, 2.7, 3.2, 3.3) and VM implementations
(CPython, PyPy).
-
libearth.compat.IRON_PYTHON = False
(bool) Whether it is IronPython or not.
-
libearth.compat.PY3 = False
(bool) Whether it is Python 3.x or not.
-
libearth.compat.UNICODE_BY_DEFAULT = False
(bool) Whether the Python VM uses Unicode strings by default.
It must be True if PY3 or IronPython.
-
libearth.compat.binary(string, var=None)
Makes string to str in Python 2.
Makes string to bytes in Python 3 or IronPython.
Parameters: |
- string (bytes, str, unicode) – a string to cast it to binary_type
- var (str) – an optional variable name to be used for error message
|
-
libearth.compat.binary_type
(type) Type for representing binary data. str in Python 2
and bytes in Python 3.
alias of str
-
libearth.compat.encode_filename(filename)
If filename is a text_type, encode it to
binary_type according to filesystem’s default encoding.
-
libearth.compat.file_types = (<class 'io.RawIOBase'>, <type 'file'>)
(type, tuple) Types for file objects that have
fileno().
-
libearth.compat.string_type
(type) Type for text data. basestring in Python 2
and str in Python 3.
alias of basestring
-
libearth.compat.text(string)
Makes string to str in Python 3 or IronPython.
Does nothing in Python 2.
-
libearth.compat.text_type
(type) Type for representing Unicode textual data.
unicode in Python 2 and str in Python 3.
alias of unicode
-
class libearth.compat.xrange
The xrange() function. Alias for range() in Python 3.
This proxy module offers a compatibility layer between several ElementTree
implementations.
It provides the following two functions:
-
libearth.compat.etree.fromstring(string)
Parse the given XML string.
Parameters: | string (str, bytes, basestring) – xml string to parse |
Returns: | the element tree object |
-
libearth.compat.etree.fromstringlist(iterable)
Parse the given chunks of XML string.
Parameters: | iterable (collections.Iterable) – chunks of xml string to parse |
Returns: | the element tree object |
-
libearth.compat.etree.tostring(tree)
Generate an XML string from the given element tree.
Parameters: | tree – an element tree object to serialize |
Returns: | an xml string |
Return type: | str, bytes |
libearth.xml.compat.clrxmlreader — XML parser implementation for CLR
Python xml.sax parser implementation and ElementTree builder using
CLR System.Xml.XmlReader.
-
libearth.compat.clrxmlreader.XMLNS_XMLNS = 'http://www.w3.org/2000/xmlns/'
(str) The reserved namespace URI for XML namespace.
-
class libearth.compat.clrxmlreader.IteratorStream(iterable)
System.IO.Stream implementation that takes a Python iterable
and then transforms it into CLR stream.
Parameters: | iterable (collections.Iterable) – a Python iterable to transform |
-
class libearth.compat.clrxmlreader.TreeBuilder
ElementTree builder using System.Xml.XmlReader.
-
class libearth.compat.clrxmlreader.XmlReader
SAX PullReader implementation
using CLR System.Xml.XmlReader.
-
libearth.compat.clrxmlreader.create_parser()
Create a new XmlReader() parser instance.
Returns: | a new parser instance |
Return type: | XmlReader |
-
class libearth.compat.xmlpullreader.PullReader
SAX parser interface which provides similar but slightly less power
than IncremenetalParser.
IncrementalParser can feed arbitrary length
of bytes while it can’t determine how long bytes to feed.
-
close()
This method is called when the entire XML document has been
passed to the parser through the feed method, to notify the
parser that there are no more data. This allows the parser to
do the final checks on the document and empty the internal
data buffer.
The parser will not be ready to parse another document until
the reset method has been called.
close() may raise SAXException.
-
feed()
This method makes the parser to parse the next step node,
emitting the corresponding events.
feed() may raise SAXException.
Returns: | whether the stream buffer is not empty yet |
Return type: | bool |
Raises xml.sax.SAXException: |
| when something goes wrong |
-
prepareParser(iterable)
This method is called by the parse implementation to allow
the SAX 2.0 driver to prepare itself for parsing.
Parameters: | iterable (collections.Iterable) – iterable of bytes |
-
reset()
This method is called after close has been called to reset
the parser so that it is ready to parse new documents.
The results of calling parse or feed after close without calling
reset are undefined.
Crawl feeds.
-
libearth.crawler.DEFAULT_TIMEOUT = 10
(numbers.Integral) The default timeout for connection attempts.
10 seconds.
-
exception libearth.crawler.CrawlError(feed_uri, *args, **kwargs)
Error which rises when crawling given url failed.
New in version 0.3.0: Added feed_uri parameter and corresponding feed_uri
attribute.
-
feed_uri = None
(str) The errored feed uri.
-
class libearth.crawler.CrawlResult(url, feed, hints, icon_url=None)
The result of each crawl of a feed.
It mimics triple of (url, feed, hints) for
backward compatibility to below 0.3.0, so you can still take these
values using tuple unpacking, though it’s not recommended way to
get these values anymore.
-
add_as_subscription(subscription_set)
Add it as a subscription to the given subscription_set.
Parameters: | subscription_set (SubscriptionSet) – a subscription list or category to add
a new subscription |
Returns: | the created subscription object |
Return type: | Subscription |
-
feed = None
(Feed) The crawled feed.
-
hints = None
(collections.Mapping) The extra hints for the crawler
e.g. skipHours, skipMinutes, skipDays.
It might be None.
-
icon_url = None
(str) The favicon url of the feed if exists.
It might be None.
-
url = None
(str) The crawled feed url.
-
libearth.crawler.crawl(feed_urls, pool_size, timeout=10)
Crawl feeds in feed list using thread.
Parameters: |
|
Returns: | a set of CrawlResult objects
|
Return type: | collections.Iterable
|
Changed in version 0.3.0: It became to return a set of CrawlResults instead of
tuples.
Changed in version 0.3.0: The parameter feeds was renamed to feed_urls.
New in version 0.3.0: Added optional timeout parameter.
libearth internally stores archive data as Atom format. It’s exactly
not a complete set of RFC 4287, but a subset of the most of that.
Since it’s not intended for crawling but internal representation, it does not
follow robustness principle or such thing. It simply treats stored data are
all valid and well-formed.
-
libearth.feed.ATOM_XMLNS = 'http://www.w3.org/2005/Atom'
(str) The XML namespace name used for Atom (RFC 4287).
-
libearth.feed.MARK_XMLNS = 'http://earthreader.org/mark/'
(str) The XML namespace name used for Earth Reader Mark
metadata.
-
class libearth.feed.Category(_parent=None, **attributes)
Bases: libearth.schema.Element
Category element defined in RFC 4287 (section 4.2.2).
-
label
(str) The optional human-readable label for display in
end-user applications. It corresponds to label attribute of
RFC 4287 (section 4.2.2.3).
-
scheme_uri
(str) The URI that identifies a categorization scheme.
It corresponds to scheme attribute of RFC 4287
(section 4.2.2.2).
-
term
(str) The required machine-readable identifier string of
the cateogry. It corresponds to term attribute of
RFC 4287 (section 4.2.2.1).
-
class libearth.feed.Content(_parent=None, **attributes)
Bases: libearth.feed.Text
Content construct defined in RFC 4287
(section 4.1.3).
-
MIMETYPE_PATTERN = <_sre.SRE_Pattern object at 0x7f87c5fef880>
(re.RegexObject) The regular expression pattern that matches
with valid MIME type strings.
-
TYPE_MIMETYPE_MAP = {'text': 'text/plain', 'xhtml': 'application/xhtml+xml', 'html': 'text/html'}
(collections.Mapping) The mapping of type string
(e.g. 'text') to the corresponding MIME type
(e.g. text/plain).
-
mimetype
(str) The mimetype of the content.
-
source_uri
(str) An optional remote content URI to retrieve the content.
-
class libearth.feed.Entry(_parent=None, **kwargs)
Bases: libearth.schema.DocumentElement, libearth.feed.Metadata
Represent an individual entry, acting as a container for metadata and
data associated with the entry. It corresponds to atom:entry element
of RFC 4287 (section 4.1.2).
-
content
(Content) It either contains or links to the content of
the entry.
It corresponds to atom:content element of RFC 4287
(section 4.1.3).
-
published_at
(datetime.datetime) The tz-aware datetime
indicating an instant in time associated with an event early in the
life cycle of the entry. Typically, published_at will be
associated with the initial creation or first availability of
the resource. It corresponds to atom:published element of
RFC 4287 (section 4.2.9).
-
read
(Mark) Whether and when it’s read or unread.
-
source
(Source) If an entry is copied from one feed into another
feed, then the source feed’s metadata may be preserved within
the copied entry by adding source if it is not already present
in the entry, and including some or all of the source feed’s metadata
as the source‘s data.
It is designed to allow the aggregation of entries from different feeds
while retaining information about an entry’s source feed.
It corresponds to atom:source element of RFC 4287
(section 4.2.10).
-
starred
(Mark) Whether and when it’s starred or unstarred.
-
summary
(Text) The text field that conveys a short summary, abstract,
or excerpt of the entry. It corresponds to atom:summary element
of RFC 4287 (section 4.2.13).
-
class libearth.feed.Feed(_parent=None, **kwargs)
Bases: libearth.session.MergeableDocumentElement, libearth.feed.Source
Atom feed document, acting as a container for metadata and data
associated with the feed.
It corresponds to atom:feed element of RFC 4287
(section 4.1.1).
-
entries
(collections.MutableSequence) The list of Entry objects
that represent an individual entry, acting as a container for metadata
and data associated with the entry.
It corresponds to atom:entry element of RFC 4287
(section 4.1.2).
-
class libearth.feed.Generator(_parent=None, **attributes)
Bases: libearth.schema.Element
Identify the agent used to generate a feed, for debugging and
other purposes. It’s corresponds to atom:generator element
of RFC 4287 (section 4.2.4).
-
uri
(str) A URI that represents something relavent to the agent.
-
value
(str) The human-readable name for the generating agent.
-
version
(str) The version of the generating agent.
-
class libearth.feed.Link(_parent=None, **attributes)
Bases: libearth.schema.Element
Link element defined in RFC 4287 (section 4.2.7).
-
byte_size
(numbers.Integral) The optional hint for the length of
the linked content in octets. It corresponds to length attribute
of RFC 4287 (section 4.2.7.6).
-
html
(bool) Whether its mimetype is HTML (or XHTML).
-
language
(str) The language of the linked content. It corresponds
to hreflang attribute of RFC 4287 (section
4.2.7.4).
-
mimetype
(str) The optional hint for the MIME media type of the linked
content. It corresponds to type attribute of
RFC 4287 (section 4.2.7.3).
-
relation
(str) The relation type of the link. It corresponds to
rel attribute of RFC 4287 (section 4.2.7.2).
See also
- Existing rel values — Microformats Wiki
- This page contains tables of known HTML rel values from
specifications, formats, proposals, brainstorms, and non-trivial
POSH usage in the wild. In addition, dropped and rejected
values are listed at the end for comprehensiveness.
-
title
(str) The title of the linked resource. It corresponds to
title attribute of RFC 4287 (section 4.2.7.5).
-
uri
(str) The link’s required URI. It corresponds to href
attribute of RFC 4287 (section 4.2.7.1).
-
class libearth.feed.LinkList
Bases: _abcoll.MutableSequence
Element list mixin specialized for Link.
-
favicon
(Link) Find the link to a favicon, also known as
a shortcut or bookmark icon, if it exists.
-
filter_by_mimetype(pattern)
Filter links by their mimetype e.g.:
links.filter_by_mimetype('text/html')
pattern can include wildcards (*) as well e.g.:
links.filter_by_mimetype('application/xml+*')
Parameters: | pattern (str) – the mimetype pattern to filter |
Returns: | the filtered links |
Return type: | LinkList |
-
permalink
(Link) Find the permalink from the list. The following
list shows precedence of lookup conditions:
- html, and relation is 'alternate'
- html
- relation is 'alternate'
- No permalink: return None
-
class libearth.feed.Mark(_parent=None, **attributes)
Bases: libearth.schema.Element
Represent whether the entry is read, starred, or tagged by user.
It’s not a part of RFC 4287 Atom standard, but extension for
Earth Reader.
-
marked
(bool) Whether it’s marked or not.
-
updated_at
(datetime.datetime) Updated time.
-
class libearth.feed.Metadata(_parent=None, **attributes)
Bases: libearth.schema.Element
Common metadata shared by Source, Entry, and
Feed.
-
authors
(collections.MutableSequence) The list of Person
objects which indicates the author of the entry or feed. It corresponds
to atom:author element of RFC 4287 (section 4.2.1).
-
categories
(collections.MutableSequence) The list of Category
objects that conveys information about categories associated with
an entry or feed. It corresponds to atom:category element of
RFC 4287 (section 4.2.2).
-
contributors
(collections.MutableSequence) The list of Person
objects which indicates a person or other entity who contributed to
the entry or feed. It corresponds to atom:contributor element of
RFC 4287 (section 4.2.3).
-
id
(str) The URI that conveys a permanent, universally unique
identifier for an entry or feed. It corresponds to atom:id
element of RFC 4287 (section 4.2.6).
-
links
(collections.LinkList) The list of Link objects
that define a reference from an entry or feed to a web resource.
It corresponds to atom:link element of RFC 4287
(section 4.2.7).
-
rights
(Text) The text field that conveys information about rights
held in and of an entry or feed. It corresponds to atom:rights
element of RFC 4287 (section 4.2.10).
-
title
(Text) The human-readable title for an entry or feed.
It corresponds to atom:title element of RFC 4287
(section 4.2.14).
-
updated_at
(datetime.datetime) The tz-aware datetime
indicating the most recent instant in time when the entry was modified
in a way the publisher considers significant. Therefore, not all
modifications necessarily result in a changed updated_at value.
It corresponds to atom:updated element of RFC 4287
(section 4.2.15).
-
class libearth.feed.Person(_parent=None, **attributes)
Bases: libearth.schema.Element
Person construct defined in RFC 4287 (section 3.2).
-
email
(str) The optional email address associated with the person.
It corresponds to atom:email element of RFC 4287
(section 3.2.3).
-
name
(str) The human-readable name for the person. It corresponds
to atom:name element of RFC 4287 (section 3.2.1).
-
uri
(str) The optional URI associated with the person.
It corresponds to atom:uri element of RFC 4287
(section 3.2.2).
-
class libearth.feed.Source(_parent=None, **attributes)
Bases: libearth.feed.Metadata
All metadata for Feed excepting Feed.entries.
It corresponds to atom:source element of RFC 4287
(section 4.2.10).
-
generator
(Generator) Identify the agent used to generate a feed,
for debugging and other purposes. It corresponds to
atom:generator element of RFC 4287 (section 4.2.4).
-
icon
(str) URI that identifies an image that provides iconic
visual identification for a feed. It corresponds to atom:icon
element of RFC 4287 (section 4.2.5).
-
logo
(str) URI that identifies an image that provides visual
identification for a feed. It corresponds to atom:logo element
of RFC 4287 (section 4.2.8).
-
subtitle
(Text) A text that conveys a human-readable description or
subtitle for a feed. It corresponds to atom:subtitle element of
RFC 4287 (section 4.2.12).
-
class libearth.feed.Text(_parent=None, **attributes)
Bases: libearth.schema.Element
Text construct defined in RFC 4287 (section 3.1).
-
sanitized_html
(str) The secure HTML string of the text. If it’s
a plain text, this becomes entity-escaped HTML string (for example,
'<Hello>' becomes '<Hello>'), and if it’s a HTML text,
the value is sanitized (for example,
'<script>alert(1);</script><p>Hello</p>' comes '<p>Hello</p>').
-
type
(str) The type of the text. It could be one of 'text'
or 'html'. It corresponds to RFC 4287 (section
3.1.1).
Note
It currently does not support 'xhtml'.
-
value
(str) The content of the text. Interpretation for this
has to differ according to its type. It corresponds to
RFC 4287 (section 3.1.1.1) if type is
'text', and RFC 4287 (section 3.1.1.2) if
type is 'html'.
Repository abstracts storage backend e.g. filesystem.
There might be platforms that have no chance to directly access
file system e.g. iOS, and in that case the concept of repository
makes you to store data directly to Dropbox or Google Drive
instead of filesystem. However in the most cases we will simply use
FileSystemRepository even if data are synchronized using
Dropbox or rsync.
In order to make the repository highly configurable it provides the way
to lookup and instantiate the repository from url. For example,
the following url will load FileSystemRepository which sets
path to /home/dahlia/.earthreader/:
file:///home/dahlia/.earthreader/
For extensibility every repository class has to implement from_url()
and to_url() methods, and register it as an entry point of
libearth.repositories group e.g.:
[libearth.repositories]
file = libearth.repository:FileSystemRepository
Note that the entry point name (file in the above example) becomes
the url scheme to lookup the corresponding repository class
(libearth.repository.FileSystemRepository in the above example).
-
class libearth.repository.FileIterator(path, buffer_size)
Read a file through Iterator protocol,
with automatic closing of the file when it ends.
Parameters: |
- path (str) – the path of file
- buffer_size (numbers.Integral) – the size of bytes that would be produced each step
|
-
exception libearth.repository.FileNotFoundError
Raised when a given path does not exist.
-
class libearth.repository.FileSystemRepository(path, mkdir=True, atomic=False)
Builtin implementation of Repository interface which uses
the ordinary file system.
Parameters: |
- path (str) – the directory path to store keys
- mkdir (bool) – create the directory if it doesn’t exist yet.
True by default
- atomic – make the update invisible until it’s complete.
False by default
|
Raises: |
|
-
path = None
(str) The path of the directory to read and write data files.
It should be readable and writable.
-
exception libearth.repository.NotADirectoryError
Raised when a given path is not a directory.
-
class libearth.repository.Repository
Repository interface agnostic to its underlying storage implementation.
Stage objects can deal with documents to be stored
using the interface.
Every content in repositories is accessible using keys. It actually
abstracts out “filenames” in “file systems”, hence keys share the common
concepts with filenames. Keys are hierarchical, like file paths, so
consists of multiple sequential strings e.g. ['dir', 'subdir', 'key'].
You can list() all subkeys in the upper key as well e.g.:
repository.list(['dir', 'subdir'])
-
exists(key)
Return whether the key exists or not. It returns False
if it doesn’t exist instead of raising RepositoryKeyError.
Parameters: | key (collections.Sequence) – the key to find whether it exists |
Returns: | True only if the given key exists,
or False if not exists |
Return type: | bool |
-
classmethod from_url(url)
Create a new instance of the repository from the given url.
It’s used for configuring the repository in plain text
e.g. *.ini.
Note
Every subclass of Repository has to override
from_url() static/class method to implement details.
-
list(key)
List all subkeys in the key.
Parameters: | key (collections.Sequence) – the incomplete key that might have subkeys |
Returns: | the set of subkeys (set of strings, not set of string lists) |
Return type: | collections.Set |
Raises RepositoryKeyError: |
| the key cannot be found in
the repository, or it’s not a directory |
Note
Every subclass of Repository has to override
list() method to implement details.
-
read(key)
Read the content from the key.
Parameters: | key (collections.Sequence) – the key which stores the content to read |
Returns: | byte string chunks |
Return type: | collections.Iterable |
Raises RepositoryKeyError: |
| the key cannot be found in
the repository, or it’s not a file |
Note
Every subclass of Repository has to override
read() method to implement details.
-
to_url(scheme)
Generate a url that from_url() can accept.
It’s used for configuring the repository in plain text
e.g. *.ini. URL scheme is determined by caller,
and given through argument.
Parameters: | scheme – a determined url scheme |
Returns: | a url that from_url() can accept |
Return type: | str |
-
write(key, iterable)
Write the iterable into the key.
Parameters: |
- key (collections.Sequence) – the key to stores the iterable
- iterable (collections.Iterable) – the iterable object yiels chunks of the whole
content. every chunk has to be a byte string
|
Note
Every subclass of Repository has to override
write() method to implement details.
-
exception libearth.repository.RepositoryKeyError(key, *args, **kwargs)
Exception which rises when the requested key cannot be found
in the repository.
-
key = None
(collections.Sequence) The requested key.
-
libearth.repository.from_url(url)
Load the repository instance from the given configuration url.
libearth.schema — Declarative schema for pulling DOM parser of XML
There are well-known two ways to parse XML:
- Document Object Model
- It reads the whole XML and then makes a tree in memory. You can easily
treverse the document as a tree, but the parsing can’t be streamed.
Moreover it uses memory for data you don’t use.
- Simple API for XML
- It’s an event-based sequential access parser. It means you need to
listen events from it and then utilize its still unstructured data
by yourself. In other words, you don’t need to pay memory to data
you never use if you simply do nothing for them when you listen
the event.
Pros and cons between these two ways are obvious, but there could be
another way to parse XML: mix them.
The basic idea of this pulling DOM parser (which this module implements)
is that the parser can consume the stream just in time when you actually
reach the child node. There should be an assumption for that: parsed XML
has a schema for it. If the document is schema-free, this heuristic approach
loses the most of its efficiency.
So the parser should have the information about the schema of XML document
it’d parser, and we can declare the schema by defining classes. It’s a thing
like ORM for XML. For example, suppose there is a small XML document:
<?xml version="1.0"?>
<person version="1.0">
<name>Hong Minhee</name>
<url>http://dahlia.kr/</url>
<url>https://github.com/dahlia</url>
<url>https://bitbucket.org/dahlia</url>
<dob>1988-08-04</dob>
</person>
You can declare the schema for this like the following class definition:
class Person(DocumentElement):
__tag__ = 'person'
format_version = Attribute('version')
name = Text('name')
url = Child('url', URL, multiple=True)
dob = Child('dob', Date)
-
libearth.schema.PARSER_LIST = []
(collections.Sequence) The list of xml.sax parser
implementations to try to import.
-
libearth.schema.SCHEMA_XMLNS = 'http://earthreader.org/schema/'
(str) The XML namespace name used for schema metadatq.
-
class libearth.schema.Attribute(name, codec=None, xmlns=None, required=False, default=None, encoder=None, decoder=None)
Declare possible element attributes as a descriptor.
Parameters: |
- name (str) – the XML attribute name
- codec (Codec, collections.Callable) – an optional codec object to use. if it’s callable and
not an instance of Codec, its return value will
be used instead. it means this can take class object of
Codec subtype that is not instantiated yet
unless the constructor require any arguments
- xmlns (str) – an optional XML namespace URI
- required (bool) – whether the child is required or not.
False by default
- default (collections.Callable) – an optional function that returns default value when
the attribute is not present. the function takes an
argument which is an Element instance
- encoder (collections.Callable) – an optional function that encodes Python value into
XML text value e.g. str(). the encoder function
has to take an argument
- decoder (collections.Callable) – an optional function that decodes XML text value into
Python value e.g. int(). the decoder function
has to take a string argument
|
Changed in version 0.2.0: The default option becomes to accept only callable objects.
Below 0.2.0, default is not a function but a value which
is simply used as it is.
-
default = None
(collections.Callable) The function that returns default
value when the attribute is not present. The function takes an
argument which is an Element instance.
Changed in version 0.2.0: It becomes to accept only callable objects. Below 0.2.0,
default attribute is not a function but a value which
is simply used as it is.
-
key_pair = None
(tuple) The pair of (xmlns, name).
-
name = None
(str) The XML attribute name.
-
required = None
(bool) Whether it is required for the element.
-
xmlns = None
(str) The optional XML namespace URI.
-
class libearth.schema.Child(tag, element_type, xmlns=None, required=False, multiple=False, sort_key=None, sort_reverse=None)
Declare a possible child element as a descriptor.
In order to have Child of the element type which is not
defined yet (or self-referential) pass the class name of the element
type to contain. The name will be lazily evaluated e.g.:
class Person(Element):
'''Everyone can have their children, that also are a Person.'''
children = Child('child', 'Person', multiple=True)
Parameters: |
- tag (str) – the tag name
- xmlns (str) – an optional XML namespace URI
- element_type (type, str) – the type of child element(s).
it has to be a subtype of Element.
if it’s a string it means referring the class name
which is going to be lazily evaluated
- required (bool) – whether the child is required or not.
it’s exclusive to multiple.
False by default
- multiple (bool) – whether the child can be multiple.
it’s exclusive to required.
False by default
- sort_key (collections.Callable) – an optional function to be used for sorting
multiple child elements. it has to take a child as
Element and return a value for sort key.
it is the same to key option of sorted()
built-in function.
note that it doesn’t guarantee that all elements must
be sorted in runtime, but all elements become sorted
when it’s written using write() function.
it’s available only when multiple is True.
use sort_reverse for descending order.
- sort_reverse (bool) – ehether to reverse elements when they become
sorted. it is the same to reverse option of
sorted() built-in function.
it’s available only when sort_key is present
|
-
element_type
(type) The class of this child can contain. It must
be a subtype of Element.
-
class libearth.schema.Codec
Abstract base class for codecs to serialize Python values to be
stored in XML and deserialize XML texts to Python values.
In most cases encoding and decoding are implementation details of
format which is well-defined, so these two functions could be
paired. The interface rely on that idea.
To implement a codec, you have to subclass Codec and
override a pair of methods: encode() and decode().
Codec objects are acceptable by Attribute, Text, and
Content (all they subclass CodecDescriptor).
-
decode(text)
Decode the given XML text to Python value.
Parameters: | text (str) – XML text to decode |
Returns: | the decoded Python value |
Raises DecodeError: |
| when decoding the given XML text goes wrong |
Note
Every Codec subtype has to override this method.
-
encode(value)
Encode the given Python value into XML text.
Parameters: | value – Python value to encode |
Returns: | the encoded XML text |
Return type: | str |
Raises EncodeError: |
| when encoding the given value goes wrong |
Note
Every Codec subtype has to override this method.
-
class libearth.schema.CodecDescriptor(codec=None, encoder=None, decoder=None)
Mixin class for descriptors that provide decoder() and
encoder().
Attribute, Content and Text can take
encoder and decoder functions for them. It’s used for encoding
from Python values to XML string and decoding raw values from XML to
natural Python representations.
It can take a codec, or encode and decode separately.
(Of course they all can be present at a time.) In most cases,
you’ll need only codec parameter that encoder and decoder
are coupled:
Text('dob', Rfc3339(prefer_utc=True))
Encoders can be specified using encoder parameter of descriptor’s
constructor, or encoder() decorator.
Decoders can be specified using decoder parameter of descriptor’s
constructor, or decoder() decorator:
class Person(DocumentElement):
__tag__ = 'person'
format_version = Attribute('version')
name = Text('name')
url = Child('url', URL, multiple=True)
dob = Text('dob',
encoder=datetime.date.strftime.isoformat,
decoder=lambda s: datetime.date.strptime(s, '%Y-%m-%d'))
@format_version.encoder
def format_version(self, value):
return '.'.join(map(str, value))
@format_version.decoder
def format_version(self, value):
return tuple(map(int, value.split('.')))
Parameters: |
- codec (Codec, collections.Callable) – an optional codec object to use. if it’s callable and
not an instance of Codec, its return value will
be used instead. it means this can take class object of
Codec subtype that is not instantiated yet
unless the constructor require any arguments
- encoder (collections.Callable) – an optional function that encodes Python value into
XML text value e.g. str(). the encoder function
has to take an argument
- decoder (collections.Callable) – an optional function that decodes XML text value into
Python value e.g. int(). the decoder function
has to take a string argument
|
-
decode(text, instance)
Decode the given text as it’s programmed.
Parameters: |
- text (str) – the raw text to decode. xml attribute value or
text node value in most cases
- instance (Element) – the instance that is associated with the descriptor
|
Returns: | decoded value
|
-
decoder(function)
Decorator which sets the decoder to the decorated function:
import datetime
class Person(DocumentElement):
'''Person.dob will be a datetime.date instance.'''
__tag__ = 'person'
dob = Text('dob')
@dob.decoder
def dob(self, dob_text):
return datetime.date.strptime(dob_text, '%Y-%m-%d')
>>> p = Person('<person><dob>1987-07-26</dob></person>')
>>> p.dob
datetime.date(1987, 7, 26)
If it’s applied multiple times, all decorated functions are piped
in the order:
class Person(Element):
'''Person.age will be an integer.'''
age = Text('dob', decoder=lambda text: text.strip())
@age.decoder
def age(self, dob_text):
return datetime.date.strptime(dob_text, '%Y-%m-%d')
@age.decoder
def age(self, dob):
now = datetime.date.today()
d = now.month < dob.month or (now.month == dob.month and
now.day < dob.day)
return now.year - dob.year - d
>>> p = Person('<person>\n\t<dob>\n\t\t1987-07-26\n\t</dob>\n</person>')
>>> p.age
26
>>> datetime.date.today()
datetime.date(2013, 7, 30)
Note
This creates a copy of the descriptor instance rather than
manipulate itself in-place.
-
encoder(function)
Decorator which sets the encoder to the decorated function:
import datetime
class Person(DocumentElement):
'''Person.dob will be written to ISO 8601 format'''
__tag__ = 'person'
dob = Text('dob')
@dob.encoder
def dob(self, dob):
if not isinstance(dob, datetime.date):
raise TypeError('expected datetime.date')
return dob.strftime('%Y-%m-%d')
>>> isinstance(p, Person)
True
>>> p.dob
datetime.date(1987, 7, 26)
>>> ''.join(write(p, indent='', newline=''))
'<person><dob>1987-07-26</dob></person>'
If it’s applied multiple times, all decorated functions are piped
in the order:
class Person(Element):
'''Person.email will have mailto: prefix when it's written
to XML.
'''
email = Text('email', encoder=lambda email: 'mailto:' + email)
@age.encoder
def email(self, email):
return email.strip()
@email.encoder
def email(self, email):
login, host = email.split('@', 1)
return login + '@' + host.lower()
>>> isinstance(p, Person)
True
>>> p.email
' earthreader@librelist.com '
>>> ''.join(write(p, indent='', newline=''))
>>> '<person><email>mailto:earthreader@librelist.com</email></person>')
Note
This creates a copy of the descriptor instance rather than
manipulate itself in-place.
-
exception libearth.schema.CodecError
Rise when encoding/decoding between Python values and XML data
goes wrong.
-
class libearth.schema.Content(codec=None, encoder=None, decoder=None)
Declare possible text nodes as a descriptor.
Parameters: |
- codec (Codec, collections.Callable) – an optional codec object to use. if it’s callable and
not an instance of Codec, its return value will
be used instead. it means this can take class object of
Codec subtype that is not instantiated yet
unless the constructor require any arguments
- encoder (collections.Callable) – an optional function that encodes Python value into
XML text value e.g. str(). the encoder function
has to take an argument
- decoder (collections.Callable) – an optional function that decodes XML text value into
Python value e.g. int(). the decoder function
has to take a string argument
|
-
read(element, value)
Read raw value from XML, decode it, and then set the attribute
for content of the given element to the decoded value.
-
class libearth.schema.ContentHandler(document)
Event handler implementation for SAX parser.
It maintains the stack that contains parsing contexts of
what element is lastly open, what descriptor is associated
to the element, and the buffer for chunks of content characters
the element has. Every context is represented as the namedtuple
ParserContext.
Each time its events (startElement(), characters(),
and endElement()) are called, it forwards the data to
the associated descriptor. Descriptor subtypes
implement start_element() method and
end_element().
-
exception libearth.schema.DecodeError
Rise when decoding XML data to Python values goes wrong.
-
class libearth.schema.Descriptor(tag, xmlns=None, required=False, multiple=False, sort_key=None, sort_reverse=None)
Abstract base class for Child and Text.
-
end_element(reserved_value, content)
Abstract method that is invoked when the parser meets an end
of an element related to the descriptor. It will be called by
ContentHandler.
Parameters: |
- reserved_value – the value start_element() method
returned
- content (str) – the content text of the read element
|
-
key_pair = None
(tuple) The pair of (xmlns, tag).
-
multiple = None
(bool) Whether it can be zero or more for the element.
If it’s True required has to be False.
-
required = None
(bool) Whether it is required for the element.
If it’s True multiple has to be False.
-
sort_key = None
(collections.Callable) An optional function to be used
for sorting multiple elements. It has to take an element and
return a value for sort key. It is the same to key option of
sorted() built-in function.
It’s available only when multiple is True.
Use sort_reverse for descending order.
Note
It doesn’t guarantee that all elements must be sorted in
runtime, but all elements become sorted when it’s written
using write() function.
-
sort_reverse = None
(bool) Whether to reverse elements when they become
sorted. It is the same to reverse option of sorted()
built-in function.
It’s available only when sort_key is present.
-
start_element(element, attribute)
Abstract method that is invoked when the parser meets a start
of an element related to the descriptor. It will be called by
ContentHandler.
Parameters: |
- element (Element) – the parent element of the read element
- attribute (str) – the attribute name of the descriptor
|
Returns: | a value to reserve. it will be passed to
reserved_value parameter of end_element()
|
-
tag = None
(str) The tag name.
-
xmlns = None
(str) The optional XML namespace URI.
-
exception libearth.schema.DescriptorConflictError
Error which rises when a schema has duplicate descriptors more than
one for the same attribute, the same child element, or the text node.
-
class libearth.schema.DocumentElement(_parent=None, **kwargs)
The root element of the document.
-
__tag__
(str) Every DocumentElement subtype has to define
this attribute to the root tag name.
-
__xmlns__
(str) A DocumentElement subtype may define this
attribute to the XML namespace of the document element.
-
class libearth.schema.Element(_parent=None, **attributes)
Represent an element in XML document.
It provides the default constructor which takes keywords
and initializes the attributes by given keyword arguments.
For example, the following code that uses the default
constructor:
assert issubclass(Person, Element)
author = Person(
name='Hong Minhee',
url='http://dahlia.kr/'
)
is equivalent to the following code:
author = Person()
author.name = 'Hong Minhee'
author.url = 'http://dahlia.kr/'
-
classmethod __coerce_from__(value)
Cast a value which isn’t an instance of the element type to
the element type. It’s useful when a boxed element type could
be more naturally represented using builtin type.
For example, Mark could be represented
as a boolean value, and Text also
could be represented as a string.
The following example shows how the element type can be
automatically casted from string by implementing
__coerce_from__() class method:
@classmethod
def __coerce_from__(cls, value):
if isinstance(value, str):
return Text(value=value)
raise TypeError('expected a string or Text')
-
__entity_id__()
Identify the entity object. It returns the entity object itself
by default, but should be overridden.
Returns: | any value to identify the entity object |
-
__merge_entities__(other)
Merge two entities (self and other). It can return one
of the two, or even a new entity object. This method is used by
Session objects to merge conflicts between
concurrent updates.
Parameters: | other (Element) – other entity to merge. it’s guaranteed that it’s
older session’s (note that it doesn’t mean this entity
is older than self, but the session’s last update
is) |
Returns: | on of the two, or even an new entity object that merges
two entities |
Return type: | Element |
Note
The default implementation simply returns self.
That means the entity of the newer session will always win
unless the method is overridden.
-
class libearth.schema.ElementList(element, descriptor, value_type=None)
List-like object to represent multiple chidren. It makes the parser
to lazily consume the buffer when an element of a particular offset
is requested.
You can extend methods or properties for a particular element type
using element_list_for() class decorator e.g.:
@element_list_for(Link)
class LinkList(collections.Sequence):
'''Specialized ElementList for Link elements.'''
def filter_by_mimetype(self, mimetype):
'''Filter links by their mimetype.'''
return [link for link in self if link.mimetype == mimetype]
Extended methods/properties can be used for element lists for the type:
assert isinstance(feed.links, LinkList)
assert isinstance(feed.links, ElementList)
feed.links.filter_by_mimetype('text/html')
-
consume_buffer()
Consume the buffer for the parser. It returns a generator,
so can be stopped using break statement by caller.
-
classmethod register_specialized_type(value_type, specialized_type)
Register specialized collections.Sequence type for
a particular value_type.
An imperative version of :func`element_list_for()` class decorator.
Parameters: |
- value_type (type) – a particular element type that specialized_type
would be used for instead of default
ElementList class.
it has to be a subtype of Element
- specialized_type (type) – a collections.Sequence type which
extends methods and properties for
value_type
|
-
specialized_types = {<class 'libearth.feed.Link'>: (<class 'libearth.feed.LinkList'>, None)}
(collections.MutableMapping) The internal table for
specialized subtypes used by register_specialized_type()
method and element_list_for() class decorator.
-
exception libearth.schema.EncodeError
Rise when encoding Python values into XML data goes wrong.
-
exception libearth.schema.IntegrityError
Rise when an element is invalid according to the schema.
-
exception libearth.schema.SchemaError
Error which rises when a schema definition has logical errors.
-
class libearth.schema.Text(tag, codec=None, xmlns=None, required=False, multiple=False, encoder=None, decoder=None, sort_key=None, sort_reverse=None)
Descriptor that declares a possible child element that only cosists
of character data. All other attributes and child nodes are ignored.
Parameters: |
- tag (str) – the XML tag name
- codec (Codec, collections.Callable) – an optional codec object to use. if it’s callable and
not an instance of Codec, its return value will
be used instead. it means this can take class object of
Codec subtype that is not instantiated yet
unless the constructor require any arguments
- xmlns (str) – an optional XML namespace URI
- required (bool) – whether the child is required or not.
it’s exclusive to multiple.
False by default
- multiple (bool) – whether the child can be multiple.
it’s exclusive to required.
False by default
- encoder (collections.Callable) – an optional function that encodes Python value into
XML text value e.g. str(). the encoder function
has to take an argument
- decoder (collections.Callable) – an optional function that decodes XML text value into
Python value e.g. int(). the decoder function
has to take a string argument
- sort_key (collections.Callable) – an optional function to be used for sorting
multiple child elements. it has to take a child as
Element and return a value for sort key.
it is the same to key option of sorted()
built-in function.
note that it doesn’t guarantee that all elements must
be sorted in runtime, but all elements become sorted
when it’s written using write() function.
it’s available only when multiple is True.
use sort_reverse for descending order.
- sort_reverse (bool) – ehether to reverse elements when they become
sorted. it is the same to reverse option of
sorted() built-in function.
it’s available only when sort_key is present
|
-
libearth.schema.complete(element)
Completely load the given element.
-
class libearth.schema.element_list_for(value_type)
Class decorator which registers specialized ElementList
subclass for a particular value_type e.g.:
@element_list_for(Link)
class LinkList(collections.Sequence):
'''Specialized ElementList for Link elements.'''
def filter_by_mimetype(self, mimetype):
'''Filter links by their mimetype.'''
return [link for link in self if link.mimetype == mimetype]
Parameters: | value_type (type) – a particular element type that specialized_type
would be used for instead of default
ElementList class.
it has to be a subtype of Element |
-
libearth.schema.index_descriptors(element_type)
Index descriptors of the given element_type to make them
easy to be looked up by their identifiers (pairs of XML namespace URI
and tag name).
Parameters: | element_type (type) – a subtype of Element
to index its descriptors |
-
libearth.schema.inspect_attributes(element_type)
Get the dictionary of Attribute descriptors of
the given element_type.
Parameters: | element_type (type) – a subtype of Element to inspect |
Returns: | a dictionary of attribute identifiers (pairs of
xml namespace uri and xml attribute name) to pairs of
instance attribute name and associated Attribute
descriptor |
Return type: | collections.Mapping |
-
libearth.schema.inspect_child_tags(element_type)
Get the dictionary of Descriptor objects of
the given element_type.
Parameters: | element_type (type) – a subtype of Element to inspect |
Returns: | a dictionary of child node identifiers (pairs of
xml namespace uri and tag name) to pairs of
instance attribute name and associated Descriptor |
Return type: | collections.Mapping |
-
libearth.schema.inspect_content_tag(element_type)
Gets the Content descriptor of the given element_type.
Parameters: | element_type (type) – a subtype of Element to inspect |
Returns: | a pair of instance attribute name and associated
Content descriptor |
Return type: | tuple |
-
libearth.schema.inspect_xmlns_set(element_type)
Get the set of XML namespaces used in the given element_type,
recursively including all child elements.
Parameters: | element_type (type) – a subtype of Element to inspect |
Returns: | a set of uri strings of used all xml namespaces |
Return type: | collections.Set |
-
libearth.schema.is_partially_loaded(element)
Return whether the given element is not completely loaded
by read() yet.
Parameters: | element (Element) – an element |
Returns: | whether True if the given element is partially
loaded |
Return type: | bool |
-
libearth.schema.read(cls, iterable)
Initialize a document in read mode by opening the iterable
of XML string.
with open('doc.xml', 'rb') as f:
read(Person, f)
Returned document element is not fully read but partially loaded
into memory, and then lazily (and eventually) loaded when these
are actually needed.
Parameters: |
- cls (type) – a subtype of DocumentElement
- iterable (collections.Iterable) – chunks of XML string to read
|
Returns: | initialized document element in read mode
|
Return type: | DocumentElement
|
-
libearth.schema.validate(element, recurse=True, raise_error=True)
Validate the given element according to the schema.
from libearth.schema import IntegrityError, validate
try:
validate(element)
except IntegrityError:
print('the element {0!r} is invalid!'.format(element))
Parameters: |
- element (Element) – the element object to validate
- recurse (bool) – recursively validate the whole tree (child nodes).
True by default
- raise_error (bool) – raise exception when the element is invalid.
if it’s False it returns False
instead of raising an exception.
True by default
|
Returns: | True if the element is valid.
False if the element is invalid and
raise_error option is False`
|
Raises IntegrityError: |
| when the element is invalid and
raise_error option is True
|
-
class libearth.schema.write(document, validate=True, indent=' ', newline='n', canonical_order=False, hints=True, as_bytes=None)
Write the given document to XML string. The return value is
an iterator that yields chunks of an XML string.
with open('doc.xml', 'w') as f:
for chunk in write(document):
f.write(chunk)
Parameters: |
- document (DocumentElement) – the document element to serialize
- validate (bool) – whether validate the document or not.
True by default
- indent (str) – an optional string to be used for indent.
default is four spaces (' ')
- newline (str) – an optional character to be used for newline.
default is '\n'
- canonical_order (bool) – make the order of attributes and child nodes
consistent to any python versions and
implementations. useful for testing.
False by default
- hints (bool) – export hint values as well. hints improves efficiency of
read(). True by default
- as_bytes – return chunks as bytes (str in Python 2)
if True. return chunks as str
(unicode in Python 3) if False.
return chunks as default string type (str)
by default
|
Returns: | chunks of an XML string
|
Return type: | collections.Iterable
|
libearth.session — Isolate data from other installations
This module provides merging facilities to avoid conflict between concurrent
updates of the same document/entity from different devices (installations).
There are several concepts here.
Session abstracts installations on devices. For example, if you
have a laptop, a tablet, and a mobile phone, and two apps are installed on
the laptop, then there have to be four sessions: laptop-1, laptop-2,
table-1, and phone-1. You can think of it as branch if you are familiar
with DVCS.
Revision abstracts timestamps of updated time. An important thing
is that it preserves its session as well.
Base revisions (MergeableDocumentElement.__base_revisions__) show
what revisions the current revision is built on top of. In other words,
what revisions were merged into the current revision. RevisionSet
is a dictionary-like data structure to represent them.
-
libearth.session.SESSION_XMLNS = 'http://earthreader.org/session/'
(str) The XML namespace name used for session metadata.
-
class libearth.session.MergeableDocumentElement(_parent=None, **kwargs)
Document element which is mergeable using Session.
-
class libearth.session.Revision
The named tuple type of (Session, datetime.datetime) pair.
-
session
Alias for field number 0
-
updated_at
Alias for field number 1
-
class libearth.session.RevisionCodec
Codec to encode/decode Revision pairs.
>>> from libearth.tz import utc
>>> session = Session('test-identifier')
>>> updated_at = datetime.datetime(2013, 9, 22, 3, 43, 40, tzinfo=utc)
>>> rev = Revision(session, updated_at)
>>> RevisionCodec().encode(rev)
'test-identifier 2013-09-22T03:43:40Z'
-
RFC3339_CODEC = <libearth.codecs.Rfc3339 object at 0x7f87c5234cd0>
(Rfc3339) The internally used codec to encode
Revision.updated_at time to RFC 3339 format.
-
class libearth.session.RevisionParserHandler
SAX content handler that picks session metadata
(__revision__ and
__base_revisions__) from the given
document element.
Parsed result goes revision and base_revisions.
Used by parse_revision().
-
done = None
(bool) Represents whether the parsing is complete.
-
revision = None
(Revision) The parsed
__revision__. It might be
None.
-
class libearth.session.RevisionSet(revisions=[])
Set of Revision pairs. It provides dictionary-like
mapping protocol.
-
contains(revision)
Find whether the given revision is already merged to
the revision set. In other words, return True
if the revision doesn’t have to be merged to the revision set
anymore.
Parameters: | revision (Revision) – the revision to find whether it has to be merged
or not |
Returns: | True if the revision is included in
the revision set, or False |
Return type: | bool |
-
copy()
Make a copy of the set.
-
items()
The list of (Session, datetime.datetime) pairs.
Returns: | the list of Revision instances |
Return type: | collections.ItemsView |
-
merge(*sets)
Merge two or more RevisionSets. The latest time
remains for the same session.
-
class libearth.session.RevisionSetCodec
Codec to encode/decode multiple Revision pairs.
>>> from datetime import datetime
>>> from libearth.tz import utc
>>> revs = RevisionSet([
... (Session('a'), datetime(2013, 9, 22, 16, 58, 57, tzinfo=utc)),
... (Session('b'), datetime(2013, 9, 22, 16, 59, 30, tzinfo=utc)),
... (Session('c'), datetime(2013, 9, 22, 17, 0, 30, tzinfo=utc))
... ])
>>> encoded = RevisionSetCodec().encode(revs)
>>> encoded
'c 2013-09-22T17:00:30Z,\nb 2013-09-22T16:59:30Z,\na 2013-09-22T16:58:57Z'
>>> RevisionSetCodec().decode(encoded)
libearth.session.RevisionSet([
Revision(session=libearth.session.Session('b'),
updated_at=datetime.datetime(2013, 9, 22, 16, 59, 30,
tzinfo=libearth.tz.Utc())),
Revision(session=libearth.session.Session('c'),
updated_at=datetime.datetime(2013, 9, 22, 17, 0, 30,
tzinfo=libearth.tz.Utc())),
Revision(session=libearth.session.Session('a'),
updated_at=datetime.datetime(2013, 9, 22, 16, 58, 57,
tzinfo=libearth.tz.Utc()))
])
-
SEPARATOR_PATTERN = <_sre.SRE_Pattern object at 0x7f87c520e670>
(re.RegexObject) The regular expression pattern that matches
to separator substrings between revision pairs.
-
class libearth.session.Session
The unit of device (more abstractly, installation) that updates
the same document (e.g. Feed). Every session
must have its own unique identifier to avoid conflict between
concurrent updates from different sessions.
Parameters: | identifier (str) – the unique identifier. automatically generated
using uuid if not present |
-
IDENTIFIER_PATTERN = <_sre.SRE_Pattern object at 0x7f87c520e5a8>
(re.RegexObject) The regular expression pattern that matches
to allowed identifiers.
-
identifier = None
(str) The session identifier. It has to be distinguishable
from other devices/apps, but consistent for the same device/app.
-
interns = {}
(collections.MutableMapping) The pool of interned sessions.
It’s for maintaining single sessions for the same identifiers.
-
merge(a, b, force=False)
Merge the given two documents and return new merged document.
The given documents are not manipulated in place. Two documents
must have the same type.
Parameters: |
- a (MergeableDocumentElement) – the first document to be merged
- b (MergeableDocumentElement) – the second document to be merged
- force – by default (False) it doesn’t merge but
simply pull a or b if one already contains other.
if force is True it always merge
two. it assumes b is newer than a
|
-
pull(document)
Pull the document (of possibly other session) to the current
session.
Parameters: | document (MergeableDocumentElement) – the document to pull from the possibly other session
to the current session |
Returns: | the clone of the given document with the replaced
__revision__.
note that the Revision.updated_at value won’t
be revised. it could be the same object to the given
document object if the session is the same |
Return type: | MergeableDocumentElement |
-
revise(document)
Mark the given document as the latest revision of the current
session.
-
libearth.session.ensure_revision_pair(pair, force_cast=False)
Check the type of the given pair and error unless it’s a valid
revision pair (Session, datetime.datetime).
Parameters: |
- pair (collections.Sequence) – a value to check
- force_cast (bool) – whether to return the casted value to Revision
named tuple type
|
Returns: | the revision pair
|
Return type: | Revision, collections.Sequence
|
-
libearth.session.parse_revision(iterable)
Efficiently parse only __revision__
and __base_revisions__ from the given
iterable which contains chunks of XML. It reads only head of
the given document, and iterable will be not completely consumed
in most cases.
Note that it doesn’t validate the document.
Parameters: | iterable (collections.Iterable) – chunks of bytes which contains
a MergeableDocumentElement element |
Returns: | a pair of (__revision__,
__base_revisions__).
it might be None if the document is not stamped |
Return type: | collections.Sequence |
libearth.stage — Staging updates and transactions
Stage is a similar concept to Git’s one. It’s a unit of updates,
so every change to the repository should be done through a stage.
It also does more than Git’s stage: Route. Routing system
hides how document should be stored in the repository, and provides
the natural object-mapping interface instead.
Stage also provides transactions. All operations on staged documents should
be done within a transaction. You can open and close a transaction using
with statement e.g.:
with stage:
subs = stage.subscriptions
stage.subscriptions = some_operation(subs)
Transaction will merge all simultaneous updates if there are multiple updates
when it’s committed. You can easily achieve thread safety using transactions.
Note that it however doesn’t guarantee data integrity between multiple
processes, so you have to use different session ids when there are multiple
processes.
-
class libearth.stage.BaseStage(session, repository)
Base stage class that routes nothing yet. It should be inherited
to route document types. See also Route class.
It’s a context manager, which is possible to be passed to with
statement. The context maintains a transaction, that is required for
all operations related to the stage:
with stage:
v = stage.some_value
stage.some_value = operate(v)
If any ongoing transaction is not present while the operation requires it,
it will raise TransactionError.
Parameters: |
- session (Session) – the current session to stage
- repository (Repository) – the repository to stage
|
-
SESSION_DIRECTORY_KEY = ['.sessions']
(collections.Sequence) The repository key of the directory
where session list are stored.
-
get_current_transaction(pop=False)
Get the current ongoing transaction. If any transaction is not
begun yet, it raises TransactionError.
Returns: | the dirty buffer that should be written when the transaction
is committed |
Return type: | DirtyBuffer |
Raises TransactionError: |
| if not any transaction is not begun yet |
-
read(document_type, key)
Read a document of document_type by the given key
in the staged repository.
Note
This method is intended to be internal. Use routed properties
rather than this. See also Route.
-
repository = None
(Repository) The staged repository.
-
session = None
(Session) The current session of the stage.
-
sessions
(collections.Set) List all sessions associated to
the repository. It includes the session of the current stage.
-
touch()
Touch the latest staged time of the current session
into the repository.
Note
This method is intended to be internal.
-
transactions = None
(collections.MutableMapping) Ongoing transactions. Keys are
the context identifier (that get_current_context_id() returns),
and values are pairs of the DirtyBuffer that should be written
when the transaction is committed, and stack information.
-
write(key, document, merge=True)
Save the document to the key in the staged
repository.
Parameters: |
- key (collections.Sequence) – the key to be stored
- document (MergeableDocumentElement) – the document to save
- merge (bool) – merge with the previous revision of the same session
(if exists). True by default
|
Returns: | actually written document
|
Return type: | MergeableDocumentElement
|
Note
This method is intended to be internal. Use routed properties
rather than this. See also Route.
-
class libearth.stage.Directory(stage, document_type, key_spec, indices, key)
Mapping object which represents hierarchy of routed key path.
Parameters: |
- stage (BaseStage) – the current stage
- document_type (type) – the same to Route.document_type
- key_spec (collections.Sequence) – the same to Route.key_spec value
- indices (collections.Sequence) – the upper indices that are already completed
- key (collections.Sequence) – the upper key that are already completed
|
Note
The constructor is intended to be internal, so don’t instantiate
it directory. Use Route instead.
-
class libearth.stage.DirtyBuffer(repository, lock)
Memory-buffered proxy for the repository. It’s used for transaction
buffer which maintains updates to be written until the ongoing transaction
is committed.
Parameters: |
- repository (Repository) – the bare repository where the buffer will
flush() to
- lock (threading.RLock) – the common lock shared between dirty buffers of the same stage
|
Note
This class is intended to be internal.
-
flush(_dictionary=None, _key=None)
Flush all buffered updates to the repository.
-
repository = None
(Repository) The bare repository where
the buffer will flush() to.
-
class libearth.stage.Route(document_type, key_spec)
Descriptor that routes a document_type to a particular key
path pattern in the repository.
key_spec could contain some format strings. Format strings can
take a keyword (session) and zero or more positional arguments.
For example, if you route a document type without any positional
arguments in key_spec format:
class Stage(BaseStage):
'''Stage example.'''
metadata = Route(
Metadata,
['metadata', '{session.identifier}.xml']
)
Stage instance will has a metadata attribute that simply holds
Metadata document instance (in the example):
>>> stage.metadata # ['metadata', 'session-id.xml']
<Metadata ...>
If you route something with one or more positional arguments in
key_spec format, then it works in some different way:
class Stage(BaseStage):
'''Stage example.'''
seating_chart = Route(
Student,
['students', 'col-{0}', 'row-{1}', '{session.identifier}.xml']
)
In the above routing, two positional arguments were used. It means that
the seating_chart property will return two-dimensional mapping object
(Directory):
>>> stage.seating_chart # ['students', ...]
<libearth.directory.Directory ['students']>
>>> list(stage.seating_chart)
['A', 'B', 'C', 'D']
>>> b = stage.seating_chart['B'] # ['students', 'col-B', ...]
<libearth.directory.Directory ['students', 'col-B']>
>>> list(stage.seating_chart['B'])
['1', '2', '3', '4', '5', '6']
>>> stage.seating_chart['B']['6'] \
... # ['students', 'col-B', 'row-6', 'session-id.xml']
<Student B6>
Parameters: |
- document_type (type) – the type of document to route.
it has to be a subclass of
MergeableDocumentElement
- key_spec (collections.Sequence) – the repository key pattern that might contain some
format strings
e.g. ['docs', '{0}', '{session.identifier}.xml']`.
positional values are used for directory indices
(if present), and ``session keyword value is used
for identifying sessions
|
-
document_type = None
(type) The type of the routed document. It is a subtype of
MergeableDocumentElement.
-
key_spec = None
(collections.Sequence) The repository key pattern that
might contain some format strings.
-
class libearth.stage.Stage(session, repository)
Staged documents of Earth Reader.
-
feeds
(collections.MutableMapping) The map of feed ids to
Feed objects.
-
subscriptions
(SubscriptionList) The set of subscriptions.
-
exception libearth.stage.TransactionError
The error that rises if there’s no ongoing transaction while it’s
needed to update the stage, or if there’s already begun ongoing transaction
when the new transaction get tried to begin.
-
libearth.stage.compile_format_to_pattern(format_string)
Compile a format_string to regular expression pattern.
For example, 'string{0}like{1}this{{2}}' will be compiled to
/^string(.*?)like(.*?)this\{2\}$/.
Parameters: | format_string (str) – format string to compile |
Returns: | compiled pattern object |
Return type: | re.RegexObject |
-
libearth.stage.get_current_context_id()
Identifies which context it is (greenlet, stackless, or thread).
Returns: | the identifier of the current context |
Maintain the subscription list using OPML format, which is de facto standard
for the purpose.
-
class libearth.subscribe.Body(_parent=None, **attributes)
Bases: libearth.schema.Element
Represent body element of OPML document.
-
children
(collections.MutableSequence) Child Outline objects.
-
class libearth.subscribe.Category(_parent=None, **attributes)
Bases: libearth.subscribe.Outline, libearth.subscribe.SubscriptionSet
Category which groups Subscription objects or other
Category objects. It implements collections.MutableSet
protocol.
-
children
(collections.MutableSequence) The list of child Outline
elements. It’s for internal use.
-
class libearth.subscribe.CommaSeparatedList
Bases: libearth.schema.Codec
Encode strings e.g. ['a', 'b', 'c'] into a comma-separated list
e.g. 'a,b,c', and decode it back to a Python list. Whitespaces
between commas are ignored.
>>> codec = CommaSeparatedList()
>>> codec.encode(['technology', 'business'])
'technology,business'
>>> codec.decode('technology, business')
['technology', 'business']
-
class libearth.subscribe.Head(_parent=None, **attributes)
Bases: libearth.schema.Element
Represent head element of OPML document.
-
owner_email
(str) The owner’s email.
-
owner_name
(str) The owner’s name.
-
owner_uri
(str) The owner’s website url.
-
title
(str) The title of the subscription list.
-
class libearth.subscribe.Outline(_parent=None, **attributes)
Bases: libearth.schema.Element
Represent outline element of OPML document.
-
created_at
(datetime.datetime) The created time.
-
deleted
(bool) Whether it is deleted (archived) or not.
-
deleted_at
(datetime.datetime) The archived time, if deleted ever.
It could be None as well if it’s never deleted.
Note that it doesn’t have enough information about whether
it’s actually deleted or not. For that you have to use
deleted property instead.
-
label
(str) The human-readable text of the outline.
-
type
(str) Internally-used type identifier.
-
class libearth.subscribe.Subscription(_parent=None, **attributes)
Bases: libearth.subscribe.Outline
Subscription which holds referring feed_uri.
-
feed_id
(str) The feed identifier to be used for lookup.
It’s intended to be SHA1 digest of Feed.id value (which is UTF-8 encoded).
-
feed_uri
(str) The feed url.
-
alternate_uri
(str) The web page url.
-
icon_uri
(str) Optional favicon url.
-
class libearth.subscribe.SubscriptionList(_parent=None, **kwargs)
Bases: libearth.session.MergeableDocumentElement, libearth.subscribe.SubscriptionSet
The set (exactly, tree) of subscriptions. It consists of
Subscriptions and Category objects for grouping.
It implements collections.MutableSet protocol.
-
owner
(Person) The owner of the subscription
list.
-
title
(str) The title of the subscription list.
-
version
(distutils.version.StrictVersion) The OPML version number.
-
class libearth.subscribe.SubscriptionSet
Bases: _abcoll.MutableSet
Mixin for SubscriptionList and Category, both can
group Subscription object and other Category objects,
to implement collections.MutableSet protocol.
-
categories
(collections.Mapping) Label to Category instance
mapping.
-
children
- (collections.MutableSequence) Child Outline
- objects.
-
contains(outline, recursively=False)
Determine whether the set contains the given outline.
If recursively is False (which is by default)
it works in the same way to in operator.
Parameters: |
- outline (Outline) – the subscription or category to find
- recursively (bool) – if it’s True find the outline
in the whole tree, or if False find
it in only its direct children.
False by default
|
Returns: | True if the set (or tree) contains the given
outline, or False
|
Return type: | bool
|
-
subscribe(feed, icon_uri=None)
Add a subscription from Feed instance.
Prefer this method over add() method.
Parameters: |
- feed (Feed) – feed to subscribe
- icon_uri (str) – optional favicon url of the feed
|
Returns: | the created subscription object
|
Return type: | Subscription
|
New in version 0.3.0: Optional icon_url parameter was added.
-
subscriptions
(collections.Set) The subset which consists of only
Subscription instances.
libearth.tz — Basic timezone implementations
Almost of this module is from the official documentation of
datetime module in Python standard library.
-
libearth.tz.utc
(Utc, datetime.timezone) The tzinfo
instance that represents UTC. It’s an instance of Utc
in Python 2 (which provide no built-in fixed-offset
tzinfo implementation), and an instance of
timezone with zero offset in Python 3.
-
class libearth.tz.FixedOffset(offset, name=None)
Fixed offset in minutes east from UTC.
>>> kst = FixedOffset(9 * 60, name='Asia/Seoul') # KST +09:00
>>> current = now()
>>> current
datetime.datetime(2013, 8, 15, 3, 18, 37, 404562, tzinfo=libearth.tz.Utc())
>>> current.astimezone(kst)
datetime.datetime(2013, 8, 15, 12, 18, 37, 404562,
tzinfo=<libearth.tz.FixedOffset Asia/Seoul>)
-
class libearth.tz.Utc
UTC.
In most cases, it doesn’t need to be directly instantiated:
there’s already the utc value.
-
libearth.tz.guess_tzinfo_by_locale(language, country=None)
Guess the most commonly used time zone from the given locale.
Parameters: |
- language (str) – the language code e.g. ko, JA
- country (str) – optional country code e.g. kr, JP
|
Returns: | the most commonly used time zone, or None if can’t
guess
|
Return type: | datetime.tzinfo
|
-
libearth.tz.now()
Return the current datetime with the proper
tzinfo setting.
>>> now()
datetime.datetime(2013, 8, 15, 3, 17, 11, 892272, tzinfo=libearth.tz.Utc())
>>> now()
datetime.datetime(2013, 8, 15, 3, 17, 17, 532483, tzinfo=libearth.tz.Utc())
-
libearth.version.VERSION = '0.3.2'
(str) The version string e.g. '1.2.3'.
-
libearth.version.VERSION_INFO = (0, 3, 2)
(tuple) The triple of version numbers e.g. (1, 2, 3).