Core concepts¶
To achieve the goal of Earth Reader, its design need to resolve the following subproblems:
- Data should be stored in tangible format and more specifically, in plain text with well-structured directory layout. It would be much better if data can be easily read and parsed by other softwares.
- Data should be possible to be synchronized through several existing utilities including Dropbox, Google Drive, and even rsync, without any data corruption. In this docs we try to explain core concepts of libearth and what these concepts purpose to resolve.
Schema¶
All data libearth deals with are based on (de facto) standard formats. For example, it stores subscription list and its category hierarchy to an OPML file. OPML have been a de facto standard format to exchange subscription list by feed readers. It also stores all feed data to Atom format (RFC 4287).
Actually the most technologies related to RSS/syndication formats are from early 00’s, and it means they had used XML instead of JSON today we use for the same purpose. OPML is an (though poorly structured) XML format, and Atom also is an XML format.
Since we need to deal with several XML data and not need any other formats,
we decided to make something first-class model objects to XML like ORM to
relational databases. You can find how it can be used for designing model
objects at libearth/feed.py
and libearth/subscribe.py
.
It looks similar to Django ORM and SQLAlchemy, and makes you to deal with XML
documents in the same way you use plain Python objects.
Under the hood it does incremental parsing using SAX instead of DOM to reduce memory usage when the document is larger than a hundred megabytes.
See also
- Module
libearth.schema
- Declarative schema for pulling DOM parser of XML
Read-time merge¶
Earth Reader data can be shared by multiple installations e.g. desktop apps, mobile apps, web apps. So there must be simultaneous updates between them that could conflict. An important constraint we have is synchronization isn’t done by Earth Reader. We can’t lock files nor do atomic operations on them.
Our solution to this is read-time merge. All data are not shared between installations at least in filesystem level. They have isolated files for the same entities, and libearth merges all of them when it’s loaded into memory. Merged result doesn’t affect to all replicas but only a replica that corresponds to the installation. You can understand the approach similar to DVCS (although there are actually many differences): installations are branches, and updates from others can be pulled to mine. If there are simultaneous changes, these are merged and then committed to mine. If there’s no change for me, simply pull changes from others without merge. A big difference is that there’s no push. You can only do pull others, or wait others to pull yours. It’s because the most of existing synchronization utilities like Dropbox passively works in background. Moreover there could be offline.
Repository¶
Repository
abstracts storage backend
e.g. filesystem. There might be platforms that have no chance to
directly access file system e.g. iOS, and in that case the concept of
repository makes you to store data directly to Dropbox or Google Drive
instead of filesystem. However in the most cases we will simply use
FileSystemRepository
even if data are
synchronized using Dropbox or rsync.
See also
- Module
libearth.repository
- Repositories
Session¶
Session
abstracts installations.
Every installation has its own session identifier.
To be more exact it purposes to distinguish processes,
hence every process has its unique identifier even if they are child
processes of the same installation e.g. prefork workers.
Every session makes its own file for a document, for example,
if there are two sessions identified a and b, two files for a document
e.g. doc.xml
will be made doc.a.xml
and doc.b.xml
respectively.
For each change a session merges all changes from other sessions when a document is being loaded (read-time merge).
See also
- Module
libearth.session
- Isolate data from other installations
Stage¶
Stage
is a unit of changes i.e. an atomic
changes to be merged. It provides transactions for multi threaded environment.
If there are simultaneous changes from other sessions or other transactions,
these are automatically merged when the currently ongoing transaction is
committed.
Stage also provides Route
, a convenient interface to
access documents.
For example, you can read the subscription list by stage.subscriptions
,
and write it by stage.subscriptions = new_subscriptions
.
In the similar way you can read a feed by stage.feeds[feed_id]
,
and write it by stage.feeds[feed_id] = new_feed
.
See also
- Module
libearth.stage
- Staging updates and transactions