Libearth Changelog¶
Version 0.4.0¶
To be released.
- Python 3.2 is no more supported since even pip 8.0.0 also dropped their support for Python 3.2.
- Parsing RSS 1.0 feed is available. [issue #57]
- Refactoring
parserpackage. [issue #54]- Every single element parser can be specified using
ParserBaseand its decorator. When calling root element parser, the children elements are also parsed in hierarchical order. - Basic parsing information is stored in
SessionBaseand passed from parent parser to chlidren parsers. - Added
get_element_id(). It returns the string consists of an XML namespace and an element tag thatxml.etree.ElementTreecan recognize when finding children elements. - Support atom feed that
Textwith xhtml type.
- Every single element parser can be specified using
- Introduced new
libearth.defaultsmodule. This module provides small utilities and default data to fill initial state of Earth Reader apps. - HTML sanitizer now does rebase all links in the given document on the base
uri.
The
get_sanitized_html()method was added toTexttype. Thesanitize_html()function became to additionally requirebase_uriparameter. - Added
get_default_name()for default session name.
Version 0.3.3¶
Released on November 6, 2014.
- Fixed a bug that
complete()never terminates for documentsread()from a single chunk.
Version 0.3.2¶
Released on November 5, 2014.
- Fixed a bug that
SubscriptionLists havingOutlines without theircreated_atattribute failed to be merged on Python 3. [issue #65] - Fixed a bug that a
DocumentElementin streamed read mode is not properly marked as complete even when it’s completed bycomplete()function in some cases.
Version 0.3.1¶
Released on July 20, 2014.
- Fixed two backward compatibility breakages:
- A bug that subcategory changes hadn’t been detected when
SubscriptionLists are merged. - A bug that all children outlines become wiped when a category is deleted.
- A bug that subcategory changes hadn’t been detected when
Version 0.3.0¶
Released on July 12, 2014.
- Root
MergeableDocumentElements’__merge_entities__()methods are not ignored anymore. Respnosibilty to merge two documents is now moved fromSession.merge()method toMergeableDocumentElement.__merge_entities__()method. crawl()now return a set ofCrawlResultobjects instead oftuples.feedsparameter ofcrawl()function was renamed tofeed_urls.- Added
feed_uriparameter and correspondingfeed_uriattribute toCrawlErrorexception. - Timeout option was added to crawler.
- Added optional
timeoutparameter tocrawl(). - Added optional
timeoutparameter toget_feed(). - Added
DEFAULT_TIMEOUTconstant which is 10 seconds.
- Added optional
- Added
LinkList.faviconproperty. [issue #49] Link.relationattribute which had been optional now becomes requiredAutoDiscovery.find_feed_url()method (that returned feed links) was gone. InsteadAutoDiscovery.find()method (that returns a pair of feed links and favicon links) was introduced. [issue #49]Subscription.icon_uriattribute was introduced. [issue #49]- Added an optional
icon_uriparameter toSubscriptionSet.subscribe()method. [issue #49] - Added
normalize_xml_encoding()function to workaroundxml.etree.ElementTreemodule’s encoding detection bug. [issue #41] - Added
guess_tzinfo_by_locale()function. [issue #41] - Added
microsecondsoption toRfc822codec. - Fixed incorrect merge of subscription/category deletion.
- Subscriptions are now archived rather than deleted.
Outline(which is a common superclass ofSubscriptionandCategory) now hasdeleted_atattribute anddeletedproperty.
- Fixed several
rss2parser bugs.- Now the parser accepts several malformed
<pubDate>and<lastBuildDate>elements. - It become to guess the time zone according to its
<language>and the ccTLD (if applicable) when the date time doesn’t give any explicit time zone (which is also malformed). [issue #41] - It had ignored
<category>elements other than the last one, now it become to accept as many as there are. - It had ignored
<comments>links at all, now these become to be parsed toLinkobjects withrelation='discussion'. - Some RSS 2 feeds put a URI into
<generator>, so the parser now treat it asurirather thanvaluefor such situation. <enclosure>links had been parsed asLinkobject withoutrelationattribute, but it becomes to properly set the attribute to'enclosure'.- Mixed
<link>elements with Atom namespace also becomes to be parsed well.
- Now the parser accepts several malformed
- Fixed several
atomparser bugs.- Now it accepts obsolete PURL Atom namespace.
- Since some broken Atom feeds (e.g. Naver Blog) provide date time as RFC 822 format which is incorrect according to RFC 4287#section-3.3 (section 3.3), the parser becomes to accept RFC 822 format as well.
- Some broken Atom feeds (e.g. Naver Blog) use
<modified>which is not standard instead of<updated>which is standard, so the parser now treats<modified>equivalent to<updated>. <content>and<summary>can has text/plain and text/html in addition totextandhtml.<author>/<contributor>becomes ignored if it hasn’t any of<name>,<uri>, or<email>.- Fixed a parser bug that hadn’t interpret omission of
link[rel]attribute as'alternate'.
- Fixed the parser to work well even if there’s any file separator characters
(FS,
'\x1c').
Version 0.2.1¶
Released on July 12, 2014.
- Fixed
rss2parsing error when any empty element occurs. - Fixed a bug that
validate()function errored when any subelement hasTextdescriptor.
Version 0.2.0¶
Released on April 22, 2014.
- Session files in
.sessions/directory become to be touched only once at a transaction. [issue #43] - Added
SubscriptionSet.contains()method which providesrecursively=Trueoption. It’s useful for determining that a subcategory or subscription is in the whole tree. Attribute.defaultoption becomes to accept only callable objects. Below 0.2.0,defaultis not a function but a value which is simply used as it is.libearth.parser.heuristicmodule is gone; andget_format()function in the module is moved tolibearth.parser.autodiscoverymodule:get_format().- Added
Link.htmlproperty. - Added
LinkList.permalinkproperty. - Fixed a
FileSystemRepositorybug that conflicts reading buffer and emits broken mixed bytes when there are simultaneous readings and writings to the same key. - Fixed broken functions related to repository urls on Windows.
- Fixed
libearth.compat.parallel.cpu_count()function not to raiseNotImplementedErrorin some cases. - Fixed
Rfc822to properly work also on non-English locales e.g.ko_KR.
Version 0.1.2¶
Released on January 19, 2014.
- XML elements in data files are written in canonical order. For example,
<title>element of the feed was at the back before, but now is in front. write()becomes to store length hints of children that ismultiple, andread()becomes aware of the hints. When hints are readlen()for theElementListis O(1).- Fixed a bug that
autodiscoveryraisesAttributeErrorwhen the given HTML contains<link>to both application/atom+xml and application/rss+xml. [issue #40] - Fill
<title>to<description>if there’s no<title>(rss2). - Fill
<id>to the feed URL if there’s no<id>(atom).
Version 0.1.1¶
Released on January 2, 2014.
- Added a workaround for thread unsafety
time.strftime()on CPython. See http://bugs.python.org/issue7980 as well. [issue #32] - Fixed
UnicodeDecodeErrorwhich is raised when a feed title contains any non-ASCII characters. [issue #34 by Jae-Myoung Yu] - Now
libearth.parser.rss2fillsEntry.updated_atif it’s not given. [issue #35] - Fixed
TypeErrorwhich is raised when anyDocumentElementwithmultipleChildelements is passed tovalidate()function. - Fixed the race condition of two
FileSystemRepositoryobjects creating the same directory. [issue #36 by klutzy] parallel_map()becomes to raise exceptions at the last, if any errored. [issue #38]
Version 0.1.0¶
Released on December 13, 2013. Initial alpha version.