Previous topic

libearth.parser.atom — Atom parser

Next topic

libearth.parser.rss2 — RSS 2.0 parser

This Page

libearth.parser.autodiscovery — Autodiscovery

This module provides functions to autodiscovery feed url in document.

libearth.parser.autodiscovery.ATOM_TYPE = 'application/atom+xml'

(str) The MIME type of Atom format.

libearth.parser.autodiscovery.RSS_TYPE = 'application/rss+xml'

(str) The MIME type of RSS 2.0 format.

libearth.parser.autodiscovery.TYPE_TABLE = {<function parse_atom at 0x7f1393a89320>: 'application/atom+xml', <function parse_rss at 0x7f1393a862a8>: 'application/rss+xml'}

(collections.Mapping) The mapping table of feed types

class libearth.parser.autodiscovery.AutoDiscovery

Parse the given HTML and try finding the actual feed urls from it.

Namedtuple which is a pair of type` and ``url

type

Alias for field number 0

url

Alias for field number 1

exception libearth.parser.autodiscovery.FeedUrlNotFoundError(msg)

Exception raised when feed url cannot be found in html.

libearth.parser.autodiscovery.autodiscovery(document, url)

If the given url refers an actual feed, it returns the given url without any change.

If the given url is a url of an ordinary web page (i.e. text/html), it finds the urls of the corresponding feed. It returns feed urls in feed types’ lexicographical order.

If autodiscovery failed, it raise FeedUrlNotFoundError.

Parameters:
  • document (str) – html, or xml strings
  • url (str) – the url used to retrieve the document. if feed url is in html and represented in relative url, it will be rebuilt on top of the url
Returns:

list of FeedLink objects

Return type:

collections.MutableSequence

libearth.parser.autodiscovery.get_format(document)

Guess the syndication format of an arbitrary document.

Parameters:document (str, bytes) – document string to guess
Returns:the function possible to parse the given document
Return type:collections.Callable

Changed in version 0.2.0: The function was in libearth.parser.heuristic module (which is removed now) before 0.2.0, but now it’s moved to libearth.parser.autodiscovery.

Fork me on GitHub