libearth.parser.atom
— Atom parser¶
Parsing Atom feed. Atom specification is RFC 4287
-
libearth.parser.atom.
ATOM_XMLNS_SET
= frozenset(['http://purl.org/atom/ns#', 'http://www.w3.org/2005/Atom'])¶ (
frozenset
) The set of XML namespaces for Atom format.
-
class
libearth.parser.atom.
AtomSession
(xml_base, element_ns)¶ The session class used for parsing the Atom feed.
-
libearth.parser.atom.
XML_XMLNS
= 'http://www.w3.org/XML/1998/namespace'¶ (
str
) The XML namespace for the predefinedxml:
prefix.
-
libearth.parser.atom.
get_xml_base
(data, default)¶ Extract the xml:base in the element. If the element does not have xml:base, it returns the default value.
-
libearth.parser.atom.
parse_atom
(xml, feed_url, need_entries=True)¶ Atom parser. It parses the Atom XML and returns the feed data as internal representation.
Parameters: - xml (
str
) – target atom xml to parse - feed_url (
str
) – the url used to retrieve the atom feed. it will be the base url when there are any relative urls withoutxml:base
attribute - need_entries (
bool
) – whether to parse inner items as well. it’s useful to ignore items when retrieve<source>
in rss 2.0.True
by default.
Returns: a pair of (
Feed
, crawler hint)Return type: Changed in version 0.4.0: The
parse_entries
parameter was renamed toneed_entries
.- xml (