Libearth Changelog

Version 0.3.3

Released on November 6, 2014.

  • Fixed a bug that complete() never terminates for documents read() from a single chunk.

Version 0.3.2

Released on November 5, 2014.

Version 0.3.1

Released on July 20, 2014.

  • Fixed two backward compatibility breakages:
    • A bug that subcategory changes hadn’t been detected when SubscriptionLists are merged.
    • A bug that all children outlines become wiped when a category is deleted.

Version 0.3.0

Released on July 12, 2014.

  • Root MergeableDocumentElements’ __merge_entities__() methods are not ignored anymore. Respnosibilty to merge two documents is now moved from Session.merge() method to MergeableDocumentElement.__merge_entities__() method.
  • crawl() now return a set of CrawlResult objects instead of tuples.
  • feeds parameter of crawl() function was renamed to feed_urls.
  • Added feed_uri parameter and corresponding feed_uri attribute to CrawlError exception.
  • Timeout option was added to crawler.
    • Added optional timeout parameter to crawl().
    • Added optional timeout parameter to get_feed().
    • Added DEFAULT_TIMEOUT constant which is 10 seconds.
  • Added LinkList.favicon property. [issue #49]
  • Link.relation attribute which had been optional now becomes required
  • AutoDiscovery.find_feed_url() method (that returned feed links) was gone. Instead AutoDiscovery.find() method (that returns a pair of feed links and favicon links) was introduced. [issue #49]
  • Subscription.icon_uri attribute was introduced. [issue #49]
  • Added an optional icon_uri parameter to SubscriptionSet.subscribe() method. [issue #49]
  • Added normalize_xml_encoding() function to workaround xml.etree.ElementTree module’s encoding detection bug. [issue #41]
  • Added guess_tzinfo_by_locale() function. [issue #41]
  • Added microseconds option to Rfc822 codec.
  • Fixed incorrect merge of subscription/category deletion.
  • Fixed several rss2 parser bugs.
    • Now the parser accepts several malformed <pubDate> and <lastBuildDate> elements.
    • It become to guess the time zone according to its <language> and the ccTLD (if applicable) when the date time doesn’t give any explicit time zone (which is also malformed). [issue #41]
    • It had ignored <category> elements other than the last one, now it become to accept as many as there are.
    • It had ignored <comments> links at all, now these become to be parsed to Link objects with relation='discussion'.
    • Some RSS 2 feeds put a URI into <generator>, so the parser now treat it as uri rather than value for such situation.
    • <enclosure> links had been parsed as Link object without relation attribute, but it becomes to properly set the attribute to 'enclosure'.
    • Mixed <link> elements with Atom namespace also becomes to be parsed well.
  • Fixed several atom parser bugs.
    • Now it accepts obsolete PURL Atom namespace.
    • Since some broken Atom feeds (e.g. Naver Blog) provide date time as RFC 822 format which is incorrect according to RFC 4287#section-3.3 (section 3.3), the parser becomes to accept RFC 822 format as well.
    • Some broken Atom feeds (e.g. Naver Blog) use <modified> which is not standard instead of <updated> which is standard, so the parser now treats <modified> equivalent to <updated>.
    • <content> and <summary> can has text/plain and text/html in addition to text and html.
    • <author>/<contributor> becomes ignored if it hasn’t any of <name>, <uri>, or <email>.
    • Fixed a parser bug that hadn’t interpret omission of link[rel] attribute as 'alternate'.
  • Fixed the parser to work well even if there’s any file separator characters (FS, '\x1c').

Version 0.2.1

Released on July 12, 2014.

  • Fixed rss2 parsing error when any empty element occurs.
  • Fixed a bug that validate() function errored when any subelement has Text descriptor.

Version 0.2.0

Released on April 22, 2014.

Version 0.1.2

Released on January 19, 2014.

  • XML elements in data files are written in canonical order. For example, <title> element of the feed was at the back before, but now is in front.
  • write() becomes to store length hints of children that is multiple, and read() becomes aware of the hints. When hints are read len() for the ElementList is O(1).
  • Fixed a bug that autodiscovery raises AttributeError when the given HTML contains <link> to both application/atom+xml and application/rss+xml. [issue #40]
  • Fill <title> to <description> if there’s no <title> (rss2).
  • Fill <id> to the feed URL if there’s no <id> (atom).

Version 0.1.1

Released on January 2, 2014.

Version 0.1.0

Released on December 13, 2013. Initial alpha version.