`libearth.crawler` — Crawler¶

Crawl feeds.

libearth.crawler.DEFAULT_TIMEOUT = 10¶: (numbers.Integral) The default timeout for connection attempts. 10 seconds.

New in version 0.3.0.

exception libearth.crawler.CrawlError(feed_uri, *args, **kwargs)¶

Error which rises when crawling given url failed.

New in version 0.3.0: Added feed_uri parameter and corresponding feed_uri attribute.

feed_uri = None¶: (str) The errored feed uri.

class libearth.crawler.CrawlResult(url, feed, hints, icon_url=None)¶

The result of each crawl of a feed.

It mimics triple of (url, feed, hints) for backward compatibility to below 0.3.0, so you can still take these values using tuple unpacking, though it’s not recommended way to get these values anymore.

New in version 0.3.0.

add_as_subscription(subscription_set)¶

Add it as a subscription to the given subscription_set.

Parameters:	subscription_set (`SubscriptionSet`) – a subscription list or category to add a new subscription
Returns:	the created subscription object
Return type:	`Subscription`

feed = None¶: (Feed) The crawled feed.

hints = None¶: (collections.Mapping) The extra hints for the crawler e.g. skipHours, skipMinutes, skipDays. It might be None.

icon_url = None¶: (str) The favicon url of the feed if exists. It might be None.

url = None¶: (str) The crawled feed url.

libearth.crawler.crawl(feed_urls, pool_size, timeout=10)¶

Crawl feeds in feed list using thread.

Parameters:	feed_urls – feed urls to crawl pool_size (`numbers.Integral`) – the number of concurrent workers timeout (`numbers.Integral`) – optional timeout for connection attempts. `DEFAULT_TIMEOUT` is used if omitted
Returns:	a set of `CrawlResult` objects
Return type:	`collections.Iterable`

Changed in version 0.3.0: It became to return a set of CrawlResults instead of tuples.

Changed in version 0.3.0: The parameter feeds was renamed to feed_urls.

New in version 0.3.0: Added optional timeout parameter.

Previous topic

Next topic

This Page

`libearth.crawler` — Crawler¶

libearth.crawler — Crawler¶

`libearth.crawler` — Crawler¶