Previous topic

libearth.crawler — Crawler

Next topic

libearth.parser — Parsing various RSS formats

This Page

libearth.feed — Feeds

libearth internally stores archive data as Atom format. It’s exactly not a complete set of RFC 4287, but a subset of the most of that. Since it’s not intended for crawling but internal representation, it does not follow robustness principle or such thing. It simply treats stored data are all valid and well-formed.

libearth.feed.ATOM_XMLNS = 'http://www.w3.org/2005/Atom'

(str) The XML namespace name used for Atom (RFC 4287).

libearth.feed.MARK_XMLNS = 'http://earthreader.org/mark/'

(str) The XML namespace name used for Earth Reader Mark metadata.

class libearth.feed.Category(_parent=None, **attributes)

Category element defined in RFC 4287 (section 4.2.2).

label

(str) The optional human-readable label for display in end-user applications. It corresponds to label attribute of RFC 4287 (section 4.2.2.3).

scheme_uri

(str) The URI that identifies a categorization scheme. It corresponds to scheme attribute of RFC 4287 (section 4.2.2.2).

See also

term

(str) The required machine-readable identifier string of the cateogry. It corresponds to term attribute of RFC 4287 (section 4.2.2.1).

class libearth.feed.Content(_parent=None, **attributes)

Content construct defined in RFC 4287 (section 4.1.3).

MIMETYPE_PATTERN = <_sre.SRE_Pattern object at 0x3f84250>

(re.RegexObject) The regular expression pattern that matches with valid MIME type strings.

TYPE_MIMETYPE_MAP = {'text': 'text/plain', 'xhtml': 'application/xhtml+xml', 'html': 'text/html'}

(collections.Mapping) The mapping of type string (e.g. 'text') to the corresponding MIME type (e.g. text/plain).

mimetype

(str) The mimetype of the content.

source_uri

(str) An optional remote content URI to retrieve the content.

class libearth.feed.Entry(_parent=None, **kwargs)

Represent an individual entry, acting as a container for metadata and data associated with the entry. It corresponds to atom:entry element of RFC 4287 (section 4.1.2).

content

(Content) It either contains or links to the content of the entry.

It corresponds to atom:content element of RFC 4287 (section 4.1.3).

published_at

(datetime.datetime) The tz-aware datetime indicating an instant in time associated with an event early in the life cycle of the entry. Typically, published_at will be associated with the initial creation or first availability of the resource. It corresponds to atom:published element of RFC 4287 (section 4.2.9).

read

(Mark) Whether and when it’s read or unread.

source

(Source) If an entry is copied from one feed into another feed, then the source feed’s metadata may be preserved within the copied entry by adding source if it is not already present in the entry, and including some or all of the source feed’s metadata as the source‘s data.

It is designed to allow the aggregation of entries from different feeds while retaining information about an entry’s source feed.

It corresponds to atom:source element of RFC 4287 (section 4.2.10).

starred

(Mark) Whether and when it’s starred or unstarred.

summary

(Text) The text field that conveys a short summary, abstract, or excerpt of the entry. It corresponds to atom:summary element of RFC 4287 (section 4.2.13).

class libearth.feed.Feed(_parent=None, **kwargs)

Atom feed document, acting as a container for metadata and data associated with the feed.

It corresponds to atom:feed element of RFC 4287 (section 4.1.1).

entries

(collections.MutableSequence) The list of Entry objects that represent an individual entry, acting as a container for metadata and data associated with the entry. It corresponds to atom:entry element of RFC 4287 (section 4.1.2).

class libearth.feed.Generator(_parent=None, **attributes)

Identify the agent used to generate a feed, for debugging and other purposes. It’s corresponds to atom:generator element of RFC 4287 (section 4.2.4).

uri

(str) A URI that represents something relavent to the agent.

value

(str) The human-readable name for the generating agent.

version

(str) The version of the generating agent.

Link element defined in RFC 4287 (section 4.2.7).

byte_size

(numbers.Integral) The optional hint for the length of the linked content in octets. It corresponds to length attribute of RFC 4287 (section 4.2.7.6).

html

(bool) Whether its mimetype is HTML (or XHTML).

New in version 0.2.0.

language

(str) The language of the linked content. It corresponds to hreflang attribute of RFC 4287 (section 4.2.7.4).

mimetype

(str) The optional hint for the MIME media type of the linked content. It corresponds to type attribute of RFC 4287 (section 4.2.7.3).

relation

(str) The relation type of the link. It corresponds to rel attribute of RFC 4287 (section 4.2.7.2).

title

(str) The title of the linked resource. It corresponds to title attribute of RFC 4287 (section 4.2.7.5).

uri

(str) The link’s required URI. It corresponds to href attribute of RFC 4287 (section 4.2.7.1).

Element list mixin specialized for Link.

filter_by_mimetype(pattern)

Filter links by their mimetype e.g.:

links.filter_by_mimetype('text/html')

pattern can include wildcards (*) as well e.g.:

links.filter_by_mimetype('application/xml+*')
Parameters:pattern (str) – the mimetype pattern to filter
Returns:the filtered links
Return type:LinkList

(Link) Find the permalink from the list. The following list shows precedence of lookup conditions:

  1. html, and relation is 'alternate'
  2. html
  3. relation is 'alternate'
  4. No permalink: return None

New in version 0.2.0.

class libearth.feed.Mark(_parent=None, **attributes)

Represent whether the entry is read, starred, or tagged by user. It’s not a part of RFC 4287 Atom standard, but extension for Earth Reader.

marked

(bool) Whether it’s marked or not.

updated_at

(datetime.datetime) Updated time.

class libearth.feed.Metadata(_parent=None, **attributes)

Common metadata shared by Source, Entry, and Feed.

authors

(collections.MutableSequence) The list of Person objects which indicates the author of the entry or feed. It corresponds to atom:author element of RFC 4287 (section 4.2.1).

categories

(collections.MutableSequence) The list of Category objects that conveys information about categories associated with an entry or feed. It corresponds to atom:category element of RFC 4287 (section 4.2.2).

contributors

(collections.MutableSequence) The list of Person objects which indicates a person or other entity who contributed to the entry or feed. It corresponds to atom:contributor element of RFC 4287 (section 4.2.3).

id

(str) The URI that conveys a permanent, universally unique identifier for an entry or feed. It corresponds to atom:id element of RFC 4287 (section 4.2.6).

(collections.LinkList) The list of Link objects that define a reference from an entry or feed to a web resource. It corresponds to atom:link element of RFC 4287 (section 4.2.7).

rights

(Text) The text field that conveys information about rights held in and of an entry or feed. It corresponds to atom:rights element of RFC 4287 (section 4.2.10).

title

(Text) The human-readable title for an entry or feed. It corresponds to atom:title element of RFC 4287 (section 4.2.14).

updated_at

(datetime.datetime) The tz-aware datetime indicating the most recent instant in time when the entry was modified in a way the publisher considers significant. Therefore, not all modifications necessarily result in a changed updated_at value. It corresponds to atom:updated element of RFC 4287 (section 4.2.15).

class libearth.feed.Person(_parent=None, **attributes)

Person construct defined in RFC 4287 (section 3.2).

email

(str) The optional email address associated with the person. It corresponds to atom:email element of RFC 4287 (section 3.2.3).

name

(str) The human-readable name for the person. It corresponds to atom:name element of RFC 4287 (section 3.2.1).

uri

(str) The optional URI associated with the person. It corresponds to atom:uri element of RFC 4287 (section 3.2.2).

class libearth.feed.Source(_parent=None, **attributes)

All metadata for Feed excepting Feed.entries. It corresponds to atom:source element of RFC 4287 (section 4.2.10).

generator

(Generator) Identify the agent used to generate a feed, for debugging and other purposes. It corresponds to atom:generator element of RFC 4287 (section 4.2.4).

icon

(str) URI that identifies an image that provides iconic visual identification for a feed. It corresponds to atom:icon element of RFC 4287 (section 4.2.5).

(str) URI that identifies an image that provides visual identification for a feed. It corresponds to atom:logo element of RFC 4287 (section 4.2.8).

subtitle

(Text) A text that conveys a human-readable description or subtitle for a feed. It corresponds to atom:subtitle element of RFC 4287 (section 4.2.12).

class libearth.feed.Text(_parent=None, **attributes)

Text construct defined in RFC 4287 (section 3.1).

sanitized_html

(str) The secure HTML string of the text. If it’s a plain text, this becomes entity-escaped HTML string (for example, '<Hello>' becomes '&lt;Hello&gt;'), and if it’s a HTML text, the value is sanitized (for example, '<script>alert(1);</script><p>Hello</p>' comes '<p>Hello</p>').

type

(str) The type of the text. It could be one of 'text' or 'html'. It corresponds to RFC 4287 (section 3.1.1).

Note

It currently does not support 'xhtml'.

value

(str) The content of the text. Interpretation for this has to differ according to its type. It corresponds to RFC 4287 (section 3.1.1.1) if type is 'text', and RFC 4287 (section 3.1.1.2) if type is 'html'.

Fork me on GitHub