libearth.parser.atom — Atom parser

Parsing Atom feed. Atom specification is RFC 4287

libearth.parser.atom.ATOM_XMLNS_SET = frozenset(['http://purl.org/atom/ns#', 'http://www.w3.org/2005/Atom'])

(frozenset) The set of XML namespaces for Atom format.

class libearth.parser.atom.AtomSession(xml_base, element_ns)

The session class used for parsing the Atom feed.

element_ns = None

(str) The feed namespace to get the element attribute id.

xml_base = None

(str) The xml:base to retrieve the full uri if an relative uri is given in the element.

libearth.parser.atom.XML_XMLNS = 'http://www.w3.org/XML/1998/namespace'

(str) The XML namespace for the predefined xml: prefix.

libearth.parser.atom.get_xml_base(data, default)

Extract the xml:base in the element. If the element does not have xml:base, it returns the default value.

libearth.parser.atom.parse_atom(xml, feed_url, need_entries=True)

Atom parser. It parses the Atom XML and returns the feed data as internal representation.

Parameters:
  • xml (str) – target atom xml to parse
  • feed_url (str) – the url used to retrieve the atom feed. it will be the base url when there are any relative urls without xml:base attribute
  • need_entries (bool) – whether to parse inner items as well. it’s useful to ignore items when retrieve <source> in rss 2.0. True by default.
Returns:

a pair of (Feed, crawler hint)

Return type:

tuple

Changed in version 0.4.0: The parse_entries parameter was renamed to need_entries.