libearth.feed
— Feeds¶
libearth
internally stores archive data as Atom format. It’s exactly
not a complete set of RFC 4287, but a subset of the most of that.
Since it’s not intended for crawling but internal representation, it does not
follow robustness principle or such thing. It simply treats stored data are
all valid and well-formed.
-
libearth.feed.
ATOM_XMLNS
= 'http://www.w3.org/2005/Atom'¶
-
libearth.feed.
MARK_XMLNS
= 'http://earthreader.org/mark/'¶ (
str
) The XML namespace name used for Earth ReaderMark
metadata.
-
class
libearth.feed.
Category
(_parent=None, **attributes)¶ Bases:
libearth.schema.Element
Category element defined in RFC 4287#section-4.2.2 (section 4.2.2).
-
label
¶ (
str
) The optional human-readable label for display in end-user applications. It corresponds tolabel
attribute of RFC 4287#section-4.2.2.3 (section 4.2.2.3).
-
scheme_uri
¶ (
str
) The URI that identifies a categorization scheme. It corresponds toscheme
attribute of RFC 4287#section-4.2.2.2 (section 4.2.2.2).See also
- Tag Scheme? by Tim Bray
- Representing tags in Atom by Edward O’Connor
-
term
¶ (
str
) The required machine-readable identifier string of the cateogry. It corresponds toterm
attribute of RFC 4287#section-4.2.2.1 (section 4.2.2.1).
-
-
class
libearth.feed.
Content
(_parent=None, **attributes)¶ Bases:
libearth.feed.Text
Content construct defined in RFC 4287#section-4.1.3 (section 4.1.3).
-
MIMETYPE_PATTERN
= <_sre.SRE_Pattern object>¶ (
re.RegexObject
) The regular expression pattern that matches with valid MIME type strings.
-
TYPE_MIMETYPE_MAP
= {'text': 'text/plain', 'xhtml': 'application/xhtml+xml', 'html': 'text/html'}¶ (
collections.Mapping
) The mapping oftype
string (e.g.'text'
) to the corresponding MIME type (e.g. text/plain).
-
-
class
libearth.feed.
Entry
(_parent=None, **kwargs)¶ Bases:
libearth.schema.DocumentElement
,libearth.feed.Metadata
Represent an individual entry, acting as a container for metadata and data associated with the entry. It corresponds to
atom:entry
element of RFC 4287#section-4.1.2 (section 4.1.2).-
content
¶ (
Content
) It either contains or links to the content of the entry.It corresponds to
atom:content
element of RFC 4287#section-4.1.3 (section 4.1.3).
-
published_at
¶ (
datetime.datetime
) The tz-awaredatetime
indicating an instant in time associated with an event early in the life cycle of the entry. Typically,published_at
will be associated with the initial creation or first availability of the resource. It corresponds toatom:published
element of RFC 4287#section-4.2.9 (section 4.2.9).
-
source
¶ (
Source
) If an entry is copied from one feed into another feed, then the source feed’s metadata may be preserved within the copied entry by addingsource
if it is not already present in the entry, and including some or all of the source feed’s metadata as thesource
‘s data.It is designed to allow the aggregation of entries from different feeds while retaining information about an entry’s source feed.
It corresponds to
atom:source
element of RFC 4287#section-4.2.10 (section 4.2.10).
-
summary
¶ (
Text
) The text field that conveys a short summary, abstract, or excerpt of the entry. It corresponds toatom:summary
element of RFC 4287#section-4.2.13 (section 4.2.13).
-
-
class
libearth.feed.
EntryList
¶ Bases:
_abcoll.MutableSequence
Element list mixin specialized for
Entry
.-
sort_entries
()¶ Sort entries in time order.
-
-
class
libearth.feed.
Feed
(_parent=None, **kwargs)¶ Bases:
libearth.session.MergeableDocumentElement
,libearth.feed.Source
Atom feed document, acting as a container for metadata and data associated with the feed.
It corresponds to
atom:feed
element of RFC 4287#section-4.1.1 (section 4.1.1).-
entries
¶ (
collections.MutableSequence
) The list ofEntry
objects that represent an individual entry, acting as a container for metadata and data associated with the entry. It corresponds toatom:entry
element of RFC 4287#section-4.1.2 (section 4.1.2).
-
-
class
libearth.feed.
Generator
(_parent=None, **attributes)¶ Bases:
libearth.schema.Element
Identify the agent used to generate a feed, for debugging and other purposes. It’s corresponds to
atom:generator
element of RFC 4287#section-4.2.4 (section 4.2.4).
-
class
libearth.feed.
Link
(_parent=None, **attributes)¶ Bases:
libearth.schema.Element
Link element defined in RFC 4287#section-4.2.7 (section 4.2.7).
-
byte_size
¶ (
numbers.Integral
) The optional hint for the length of the linked content in octets. It corresponds tolength
attribute of RFC 4287#section-4.2.7.6 (section 4.2.7.6).
-
language
¶ (
str
) The language of the linked content. It corresponds tohreflang
attribute of RFC 4287#section-4.2.7.4 (section 4.2.7.4).
-
mimetype
¶ (
str
) The optional hint for the MIME media type of the linked content. It corresponds totype
attribute of RFC 4287#section-4.2.7.3 (section 4.2.7.3).
-
relation
¶ (
str
) The relation type of the link. It corresponds torel
attribute of RFC 4287#section-4.2.7.2 (section 4.2.7.2).See also
- Existing rel values — Microformats Wiki
- This page contains tables of known HTML
rel
values from specifications, formats, proposals, brainstorms, and non-trivial POSH usage in the wild. In addition, dropped and rejected values are listed at the end for comprehensiveness.
-
title
¶ (
str
) The title of the linked resource. It corresponds totitle
attribute of RFC 4287#section-4.2.7.5 (section 4.2.7.5).
-
uri
¶ (
str
) The link’s required URI. It corresponds tohref
attribute of RFC 4287#section-4.2.7.1 (section 4.2.7.1).
-
-
class
libearth.feed.
LinkList
¶ Bases:
_abcoll.MutableSequence
Element list mixin specialized for
Link
.-
favicon
¶ (
Link
) Find the link to a favicon, also known as a shortcut or bookmark icon, if it exists.New in version 0.3.0.
-
filter_by_mimetype
(pattern)¶ Filter links by their
mimetype
e.g.:links.filter_by_mimetype('text/html')
pattern
can include wildcards (*
) as well e.g.:links.filter_by_mimetype('application/xml+*')
Parameters: pattern ( str
) – the mimetype pattern to filterReturns: the filtered links Return type: LinkList
-
-
class
libearth.feed.
Mark
(_parent=None, **attributes)¶ Bases:
libearth.schema.Element
Represent whether the entry is read, starred, or tagged by user. It’s not a part of RFC 4287 Atom standard, but extension for Earth Reader.
-
updated_at
¶ (
datetime.datetime
) Updated time.
-
-
class
libearth.feed.
Metadata
(_parent=None, **attributes)¶ Bases:
libearth.schema.Element
Common metadata shared by
Source
,Entry
, andFeed
.(
collections.MutableSequence
) The list ofPerson
objects which indicates the author of the entry or feed. It corresponds toatom:author
element of RFC 4287#section-4.2.1 (section 4.2.1).
-
categories
¶ (
collections.MutableSequence
) The list ofCategory
objects that conveys information about categories associated with an entry or feed. It corresponds toatom:category
element of RFC 4287#section-4.2.2 (section 4.2.2).
-
contributors
¶ (
collections.MutableSequence
) The list ofPerson
objects which indicates a person or other entity who contributed to the entry or feed. It corresponds toatom:contributor
element of RFC 4287#section-4.2.3 (section 4.2.3).
-
id
¶ (
str
) The URI that conveys a permanent, universally unique identifier for an entry or feed. It corresponds toatom:id
element of RFC 4287#section-4.2.6 (section 4.2.6).
-
links
¶ (
LinkList
) The list ofLink
objects that define a reference from an entry or feed to a web resource. It corresponds toatom:link
element of RFC 4287#section-4.2.7 (section 4.2.7).
-
rights
¶ (
Text
) The text field that conveys information about rights held in and of an entry or feed. It corresponds toatom:rights
element of RFC 4287#section-4.2.10 (section 4.2.10).
-
title
¶ (
Text
) The human-readable title for an entry or feed. It corresponds toatom:title
element of RFC 4287#section-4.2.14 (section 4.2.14).
-
updated_at
¶ (
datetime.datetime
) The tz-awaredatetime
indicating the most recent instant in time when the entry was modified in a way the publisher considers significant. Therefore, not all modifications necessarily result in a changedupdated_at
value. It corresponds toatom:updated
element of RFC 4287#section-4.2.15 (section 4.2.15).
-
class
libearth.feed.
Person
(_parent=None, **attributes)¶ Bases:
libearth.schema.Element
Person construct defined in RFC 4287#section-3.2 (section 3.2).
-
email
¶ (
str
) The optional email address associated with the person. It corresponds toatom:email
element of RFC 4287#section-3.2.3 (section 3.2.3).
-
name
¶ (
str
) The human-readable name for the person. It corresponds toatom:name
element of RFC 4287#section-3.2.1 (section 3.2.1).
-
uri
¶ (
str
) The optional URI associated with the person. It corresponds toatom:uri
element of RFC 4287#section-3.2.2 (section 3.2.2).
-
-
class
libearth.feed.
Source
(_parent=None, **attributes)¶ Bases:
libearth.feed.Metadata
All metadata for
Feed
exceptingFeed.entries
. It corresponds toatom:source
element of RFC 4287#section-4.2.10 (section 4.2.10).-
generator
¶ (
Generator
) Identify the agent used to generate a feed, for debugging and other purposes. It corresponds toatom:generator
element of RFC 4287#section-4.2.4 (section 4.2.4).
-
icon
¶ (
str
) URI that identifies an image that provides iconic visual identification for a feed. It corresponds toatom:icon
element of RFC 4287#section-4.2.5 (section 4.2.5).
-
logo
¶ (
str
) URI that identifies an image that provides visual identification for a feed. It corresponds toatom:logo
element of RFC 4287#section-4.2.8 (section 4.2.8).
-
subtitle
¶ (
Text
) A text that conveys a human-readable description or subtitle for a feed. It corresponds toatom:subtitle
element of RFC 4287#section-4.2.12 (section 4.2.12).
-
-
class
libearth.feed.
Text
(_parent=None, **attributes)¶ Bases:
libearth.schema.Element
Text construct defined in RFC 4287#section-3.1 (section 3.1).
-
get_sanitized_html
(base_uri=None)¶ Get the secure HTML string of the text. If it’s a plain text, this returns entity-escaped HTML string (for example,
'<Hello>'
becomes'<Hello>'
), and if it’s a HTML text, thevalue
is sanitized (for example,'<script>alert(1);</script><p>Hello</p>'
comes'<p>Hello</p>'
).New in version 0.4.0.
-
sanitized_html
¶ Get the secure HTML string of the text. If it’s a plain text, this returns entity-escaped HTML string (for example,
'<Hello>'
becomes'<Hello>'
), and if it’s a HTML text, thevalue
is sanitized (for example,'<script>alert(1);</script><p>Hello</p>'
comes'<p>Hello</p>'
).New in version 0.4.0.
-
type
¶ (
str
) The type of the text. It could be one of'text'
or'html'
. It corresponds to RFC 4287#section-3.1.1 (section 3.1.1).
-
value
¶ (
str
) The content of the text. Interpretation for this has to differ according to itstype
. It corresponds to RFC 4287#section-3.1.1.1 (section 3.1.1.1) iftype
is'text'
, and RFC 4287#section-3.1.1.2 (section 3.1.1.2) iftype
is'html'
.
-