libearth.sanitizer — Sanitize HTML tags¶
-
class
libearth.sanitizer.HtmlSanitizer¶ HTML parser that is internally used by
sanitize_html()function.-
DISALLOWED_SCHEMES= frozenset(['about', 'jscript', 'livescript', 'javascript', 'mocha', 'vbscript', 'data'])¶ (
collections.Set) The set of disallowed URI schemes e.g.javascript:.
-
DISALLOWED_STYLE_PATTERN= <_sre.SRE_Pattern object at 0x1a58550>¶ (
re.RegexObject) The regular expression pattern that matches to disallowed CSS properties.
-
-
class
libearth.sanitizer.MarkupTagCleaner¶ HTML parser that is internally used by
clean_html()function.
-
libearth.sanitizer.clean_html(html)¶ Strip all markup tags from
htmlstring. That means, it simply makes the givenhtmldocument a plain text.Parameters: html ( str) – html string to cleanReturns: cleaned plain text Return type: str
-
libearth.sanitizer.sanitize_html(html)¶ Sanitize the given
htmlstring. It removes the following tags and attributes that are not secure nor useful for RSS reader layout:<script>tagsdisplay: none;styles- JavaScript event attributes e.g.
onclick,onload hrefattributes that start withjavascript:,jscript:,livescript:,vbscript:,data:,about:, ormocha:.
Parameters: html ( str) – html string to sanitizeReturns: cleaned plain text Return type: str