Mercurial > repos > shellac > sam_consensus_v3
comparison env/lib/python3.9/site-packages/bleach/_vendor/html5lib-1.1.dist-info/METADATA @ 0:4f3585e2f14b draft default tip
"planemo upload commit 60cee0fc7c0cda8592644e1aad72851dec82c959"
author | shellac |
---|---|
date | Mon, 22 Mar 2021 18:12:50 +0000 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:4f3585e2f14b |
---|---|
1 Metadata-Version: 2.1 | |
2 Name: html5lib | |
3 Version: 1.1 | |
4 Summary: HTML parser based on the WHATWG HTML specification | |
5 Home-page: https://github.com/html5lib/html5lib-python | |
6 Maintainer: James Graham | |
7 Maintainer-email: james@hoppipolla.co.uk | |
8 License: MIT License | |
9 Platform: UNKNOWN | |
10 Classifier: Development Status :: 5 - Production/Stable | |
11 Classifier: Intended Audience :: Developers | |
12 Classifier: License :: OSI Approved :: MIT License | |
13 Classifier: Operating System :: OS Independent | |
14 Classifier: Programming Language :: Python | |
15 Classifier: Programming Language :: Python :: 2 | |
16 Classifier: Programming Language :: Python :: 2.7 | |
17 Classifier: Programming Language :: Python :: 3 | |
18 Classifier: Programming Language :: Python :: 3.5 | |
19 Classifier: Programming Language :: Python :: 3.6 | |
20 Classifier: Programming Language :: Python :: 3.7 | |
21 Classifier: Programming Language :: Python :: 3.8 | |
22 Classifier: Programming Language :: Python :: Implementation :: CPython | |
23 Classifier: Programming Language :: Python :: Implementation :: PyPy | |
24 Classifier: Topic :: Software Development :: Libraries :: Python Modules | |
25 Classifier: Topic :: Text Processing :: Markup :: HTML | |
26 Requires-Python: >=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.* | |
27 Requires-Dist: six (>=1.9) | |
28 Requires-Dist: webencodings | |
29 Provides-Extra: all | |
30 Requires-Dist: genshi ; extra == 'all' | |
31 Requires-Dist: chardet (>=2.2) ; extra == 'all' | |
32 Requires-Dist: lxml ; (platform_python_implementation == 'CPython') and extra == 'all' | |
33 Provides-Extra: chardet | |
34 Requires-Dist: chardet (>=2.2) ; extra == 'chardet' | |
35 Provides-Extra: genshi | |
36 Requires-Dist: genshi ; extra == 'genshi' | |
37 Provides-Extra: lxml | |
38 Requires-Dist: lxml ; (platform_python_implementation == 'CPython') and extra == 'lxml' | |
39 | |
40 html5lib | |
41 ======== | |
42 | |
43 .. image:: https://travis-ci.org/html5lib/html5lib-python.svg?branch=master | |
44 :target: https://travis-ci.org/html5lib/html5lib-python | |
45 | |
46 | |
47 html5lib is a pure-python library for parsing HTML. It is designed to | |
48 conform to the WHATWG HTML specification, as is implemented by all major | |
49 web browsers. | |
50 | |
51 | |
52 Usage | |
53 ----- | |
54 | |
55 Simple usage follows this pattern: | |
56 | |
57 .. code-block:: python | |
58 | |
59 import html5lib | |
60 with open("mydocument.html", "rb") as f: | |
61 document = html5lib.parse(f) | |
62 | |
63 or: | |
64 | |
65 .. code-block:: python | |
66 | |
67 import html5lib | |
68 document = html5lib.parse("<p>Hello World!") | |
69 | |
70 By default, the ``document`` will be an ``xml.etree`` element instance. | |
71 Whenever possible, html5lib chooses the accelerated ``ElementTree`` | |
72 implementation (i.e. ``xml.etree.cElementTree`` on Python 2.x). | |
73 | |
74 Two other tree types are supported: ``xml.dom.minidom`` and | |
75 ``lxml.etree``. To use an alternative format, specify the name of | |
76 a treebuilder: | |
77 | |
78 .. code-block:: python | |
79 | |
80 import html5lib | |
81 with open("mydocument.html", "rb") as f: | |
82 lxml_etree_document = html5lib.parse(f, treebuilder="lxml") | |
83 | |
84 When using with ``urllib2`` (Python 2), the charset from HTTP should be | |
85 pass into html5lib as follows: | |
86 | |
87 .. code-block:: python | |
88 | |
89 from contextlib import closing | |
90 from urllib2 import urlopen | |
91 import html5lib | |
92 | |
93 with closing(urlopen("http://example.com/")) as f: | |
94 document = html5lib.parse(f, transport_encoding=f.info().getparam("charset")) | |
95 | |
96 When using with ``urllib.request`` (Python 3), the charset from HTTP | |
97 should be pass into html5lib as follows: | |
98 | |
99 .. code-block:: python | |
100 | |
101 from urllib.request import urlopen | |
102 import html5lib | |
103 | |
104 with urlopen("http://example.com/") as f: | |
105 document = html5lib.parse(f, transport_encoding=f.info().get_content_charset()) | |
106 | |
107 To have more control over the parser, create a parser object explicitly. | |
108 For instance, to make the parser raise exceptions on parse errors, use: | |
109 | |
110 .. code-block:: python | |
111 | |
112 import html5lib | |
113 with open("mydocument.html", "rb") as f: | |
114 parser = html5lib.HTMLParser(strict=True) | |
115 document = parser.parse(f) | |
116 | |
117 When you're instantiating parser objects explicitly, pass a treebuilder | |
118 class as the ``tree`` keyword argument to use an alternative document | |
119 format: | |
120 | |
121 .. code-block:: python | |
122 | |
123 import html5lib | |
124 parser = html5lib.HTMLParser(tree=html5lib.getTreeBuilder("dom")) | |
125 minidom_document = parser.parse("<p>Hello World!") | |
126 | |
127 More documentation is available at https://html5lib.readthedocs.io/. | |
128 | |
129 | |
130 Installation | |
131 ------------ | |
132 | |
133 html5lib works on CPython 2.7+, CPython 3.5+ and PyPy. To install: | |
134 | |
135 .. code-block:: bash | |
136 | |
137 $ pip install html5lib | |
138 | |
139 The goal is to support a (non-strict) superset of the versions that `pip | |
140 supports | |
141 <https://pip.pypa.io/en/stable/installing/#python-and-os-compatibility>`_. | |
142 | |
143 Optional Dependencies | |
144 --------------------- | |
145 | |
146 The following third-party libraries may be used for additional | |
147 functionality: | |
148 | |
149 - ``lxml`` is supported as a tree format (for both building and | |
150 walking) under CPython (but *not* PyPy where it is known to cause | |
151 segfaults); | |
152 | |
153 - ``genshi`` has a treewalker (but not builder); and | |
154 | |
155 - ``chardet`` can be used as a fallback when character encoding cannot | |
156 be determined. | |
157 | |
158 | |
159 Bugs | |
160 ---- | |
161 | |
162 Please report any bugs on the `issue tracker | |
163 <https://github.com/html5lib/html5lib-python/issues>`_. | |
164 | |
165 | |
166 Tests | |
167 ----- | |
168 | |
169 Unit tests require the ``pytest`` and ``mock`` libraries and can be | |
170 run using the ``py.test`` command in the root directory. | |
171 | |
172 Test data are contained in a separate `html5lib-tests | |
173 <https://github.com/html5lib/html5lib-tests>`_ repository and included | |
174 as a submodule, thus for git checkouts they must be initialized:: | |
175 | |
176 $ git submodule init | |
177 $ git submodule update | |
178 | |
179 If you have all compatible Python implementations available on your | |
180 system, you can run tests on all of them using the ``tox`` utility, | |
181 which can be found on PyPI. | |
182 | |
183 | |
184 Questions? | |
185 ---------- | |
186 | |
187 There's a mailing list available for support on Google Groups, | |
188 `html5lib-discuss <http://groups.google.com/group/html5lib-discuss>`_, | |
189 though you may get a quicker response asking on IRC in `#whatwg on | |
190 irc.freenode.net <http://wiki.whatwg.org/wiki/IRC>`_. | |
191 | |
192 Change Log | |
193 ---------- | |
194 | |
195 1.1 | |
196 ~~~ | |
197 | |
198 UNRELEASED | |
199 | |
200 Breaking changes: | |
201 | |
202 * Drop support for Python 3.3. (#358) | |
203 * Drop support for Python 3.4. (#421) | |
204 | |
205 Deprecations: | |
206 | |
207 * Deprecate the ``html5lib`` sanitizer (``html5lib.serialize(sanitize=True)`` and | |
208 ``html5lib.filters.sanitizer``). We recommend users migrate to `Bleach | |
209 <https://github.com/mozilla/bleach>`. Please let us know if Bleach doesn't suffice for your | |
210 use. (#443) | |
211 | |
212 Other changes: | |
213 | |
214 * Try to import from ``collections.abc`` to remove DeprecationWarning and ensure | |
215 ``html5lib`` keeps working in future Python versions. (#403) | |
216 * Drop optional ``datrie`` dependency. (#442) | |
217 | |
218 | |
219 1.0.1 | |
220 ~~~~~ | |
221 | |
222 Released on December 7, 2017 | |
223 | |
224 Breaking changes: | |
225 | |
226 * Drop support for Python 2.6. (#330) (Thank you, Hugo, Will Kahn-Greene!) | |
227 * Remove ``utils/spider.py`` (#353) (Thank you, Jon Dufresne!) | |
228 | |
229 Features: | |
230 | |
231 * Improve documentation. (#300, #307) (Thank you, Jon Dufresne, Tom Most, | |
232 Will Kahn-Greene!) | |
233 * Add iframe seamless boolean attribute. (Thank you, Ritwik Gupta!) | |
234 * Add itemscope as a boolean attribute. (#194) (Thank you, Jonathan Vanasco!) | |
235 * Support Python 3.6. (#333) (Thank you, Jon Dufresne!) | |
236 * Add CI support for Windows using AppVeyor. (Thank you, John Vandenberg!) | |
237 * Improve testing and CI and add code coverage (#323, #334), (Thank you, Jon | |
238 Dufresne, John Vandenberg, Sam Sneddon, Will Kahn-Greene!) | |
239 * Semver-compliant version number. | |
240 | |
241 Bug fixes: | |
242 | |
243 * Add support for setuptools < 18.5 to support environment markers. (Thank you, | |
244 John Vandenberg!) | |
245 * Add explicit dependency for six >= 1.9. (Thank you, Eric Amorde!) | |
246 * Fix regexes to work with Python 3.7 regex adjustments. (#318, #379) (Thank | |
247 you, Benedikt Morbach, Ville Skyttä, Mark Vasilkov!) | |
248 * Fix alphabeticalattributes filter namespace bug. (#324) (Thank you, Will | |
249 Kahn-Greene!) | |
250 * Include license file in generated wheel package. (#350) (Thank you, Jon | |
251 Dufresne!) | |
252 * Fix annotation-xml typo. (#339) (Thank you, Will Kahn-Greene!) | |
253 * Allow uppercase hex chararcters in CSS colour check. (#377) (Thank you, | |
254 Komal Dembla, Hugo!) | |
255 | |
256 | |
257 1.0 | |
258 ~~~ | |
259 | |
260 Released and unreleased on December 7, 2017. Badly packaged release. | |
261 | |
262 | |
263 0.999999999/1.0b10 | |
264 ~~~~~~~~~~~~~~~~~~ | |
265 | |
266 Released on July 15, 2016 | |
267 | |
268 * Fix attribute order going to the tree builder to be document order | |
269 instead of reverse document order(!). | |
270 | |
271 | |
272 0.99999999/1.0b9 | |
273 ~~~~~~~~~~~~~~~~ | |
274 | |
275 Released on July 14, 2016 | |
276 | |
277 * **Added ordereddict as a mandatory dependency on Python 2.6.** | |
278 | |
279 * Added ``lxml``, ``genshi``, ``datrie``, ``charade``, and ``all`` | |
280 extras that will do the right thing based on the specific | |
281 interpreter implementation. | |
282 | |
283 * Now requires the ``mock`` package for the testsuite. | |
284 | |
285 * Cease supporting DATrie under PyPy. | |
286 | |
287 * **Remove PullDOM support, as this hasn't ever been properly | |
288 tested, doesn't entirely work, and as far as I can tell is | |
289 completely unused by anyone.** | |
290 | |
291 * Move testsuite to ``py.test``. | |
292 | |
293 * **Fix #124: move to webencodings for decoding the input byte stream; | |
294 this makes html5lib compliant with the Encoding Standard, and | |
295 introduces a required dependency on webencodings.** | |
296 | |
297 * **Cease supporting Python 3.2 (in both CPython and PyPy forms).** | |
298 | |
299 * **Fix comments containing double-dash with lxml 3.5 and above.** | |
300 | |
301 * **Use scripting disabled by default (as we don't implement | |
302 scripting).** | |
303 | |
304 * **Fix #11, avoiding the XSS bug potentially caused by serializer | |
305 allowing attribute values to be escaped out of in old browser versions, | |
306 changing the quote_attr_values option on serializer to take one of | |
307 three values, "always" (the old True value), "legacy" (the new option, | |
308 and the new default), and "spec" (the old False value, and the old | |
309 default).** | |
310 | |
311 * **Fix #72 by rewriting the sanitizer to apply only to treewalkers | |
312 (instead of the tokenizer); as such, this will require amending all | |
313 callers of it to use it via the treewalker API.** | |
314 | |
315 * **Drop support of charade, now that chardet is supported once more.** | |
316 | |
317 * **Replace the charset keyword argument on parse and related methods | |
318 with a set of keyword arguments: override_encoding, transport_encoding, | |
319 same_origin_parent_encoding, likely_encoding, and default_encoding.** | |
320 | |
321 * **Move filters._base, treebuilder._base, and treewalkers._base to .base | |
322 to clarify their status as public.** | |
323 | |
324 * **Get rid of the sanitizer package. Merge sanitizer.sanitize into the | |
325 sanitizer.htmlsanitizer module and move that to sanitizer. This means | |
326 anyone who used sanitizer.sanitize or sanitizer.HTMLSanitizer needs no | |
327 code changes.** | |
328 | |
329 * **Rename treewalkers.lxmletree to .etree_lxml and | |
330 treewalkers.genshistream to .genshi to have a consistent API.** | |
331 | |
332 * Move a whole load of stuff (inputstream, ihatexml, trie, tokenizer, | |
333 utils) to be underscore prefixed to clarify their status as private. | |
334 | |
335 | |
336 0.9999999/1.0b8 | |
337 ~~~~~~~~~~~~~~~ | |
338 | |
339 Released on September 10, 2015 | |
340 | |
341 * Fix #195: fix the sanitizer to drop broken URLs (it threw an | |
342 exception between 0.9999 and 0.999999). | |
343 | |
344 | |
345 0.999999/1.0b7 | |
346 ~~~~~~~~~~~~~~ | |
347 | |
348 Released on July 7, 2015 | |
349 | |
350 * Fix #189: fix the sanitizer to allow relative URLs again (as it did | |
351 prior to 0.9999/1.0b5). | |
352 | |
353 | |
354 0.99999/1.0b6 | |
355 ~~~~~~~~~~~~~ | |
356 | |
357 Released on April 30, 2015 | |
358 | |
359 * Fix #188: fix the sanitizer to not throw an exception when sanitizing | |
360 bogus data URLs. | |
361 | |
362 | |
363 0.9999/1.0b5 | |
364 ~~~~~~~~~~~~ | |
365 | |
366 Released on April 29, 2015 | |
367 | |
368 * Fix #153: Sanitizer fails to treat some attributes as URLs. Despite how | |
369 this sounds, this has no known security implications. No known version | |
370 of IE (5.5 to current), Firefox (3 to current), Safari (6 to current), | |
371 Chrome (1 to current), or Opera (12 to current) will run any script | |
372 provided in these attributes. | |
373 | |
374 * Pass error message to the ParseError exception in strict parsing mode. | |
375 | |
376 * Allow data URIs in the sanitizer, with a whitelist of content-types. | |
377 | |
378 * Add support for Python implementations that don't support lone | |
379 surrogates (read: Jython). Fixes #2. | |
380 | |
381 * Remove localization of error messages. This functionality was totally | |
382 unused (and untested that everything was localizable), so we may as | |
383 well follow numerous browsers in not supporting translating technical | |
384 strings. | |
385 | |
386 * Expose treewalkers.pprint as a public API. | |
387 | |
388 * Add a documentEncoding property to HTML5Parser, fix #121. | |
389 | |
390 | |
391 0.999 | |
392 ~~~~~ | |
393 | |
394 Released on December 23, 2013 | |
395 | |
396 * Fix #127: add work-around for CPython issue #20007: .read(0) on | |
397 http.client.HTTPResponse drops the rest of the content. | |
398 | |
399 * Fix #115: lxml treewalker can now deal with fragments containing, at | |
400 their root level, text nodes with non-ASCII characters on Python 2. | |
401 | |
402 | |
403 0.99 | |
404 ~~~~ | |
405 | |
406 Released on September 10, 2013 | |
407 | |
408 * No library changes from 1.0b3; released as 0.99 as pip has changed | |
409 behaviour from 1.4 to avoid installing pre-release versions per | |
410 PEP 440. | |
411 | |
412 | |
413 1.0b3 | |
414 ~~~~~ | |
415 | |
416 Released on July 24, 2013 | |
417 | |
418 * Removed ``RecursiveTreeWalker`` from ``treewalkers._base``. Any | |
419 implementation using it should be moved to | |
420 ``NonRecursiveTreeWalker``, as everything bundled with html5lib has | |
421 for years. | |
422 | |
423 * Fix #67 so that ``BufferedStream`` to correctly returns a bytes | |
424 object, thereby fixing any case where html5lib is passed a | |
425 non-seekable RawIOBase-like object. | |
426 | |
427 | |
428 1.0b2 | |
429 ~~~~~ | |
430 | |
431 Released on June 27, 2013 | |
432 | |
433 * Removed reordering of attributes within the serializer. There is now | |
434 an ``alphabetical_attributes`` option which preserves the previous | |
435 behaviour through a new filter. This allows attribute order to be | |
436 preserved through html5lib if the tree builder preserves order. | |
437 | |
438 * Removed ``dom2sax`` from DOM treebuilders. It has been replaced by | |
439 ``treeadapters.sax.to_sax`` which is generic and supports any | |
440 treewalker; it also resolves all known bugs with ``dom2sax``. | |
441 | |
442 * Fix treewalker assertions on hitting bytes strings on | |
443 Python 2. Previous to 1.0b1, treewalkers coped with mixed | |
444 bytes/unicode data on Python 2; this reintroduces this prior | |
445 behaviour on Python 2. Behaviour is unchanged on Python 3. | |
446 | |
447 | |
448 1.0b1 | |
449 ~~~~~ | |
450 | |
451 Released on May 17, 2013 | |
452 | |
453 * Implementation updated to implement the `HTML specification | |
454 <http://www.whatwg.org/specs/web-apps/current-work/>`_ as of 5th May | |
455 2013 (`SVN <http://svn.whatwg.org/webapps/>`_ revision r7867). | |
456 | |
457 * Python 3.2+ supported in a single codebase using the ``six`` library. | |
458 | |
459 * Removed support for Python 2.5 and older. | |
460 | |
461 * Removed the deprecated Beautiful Soup 3 treebuilder. | |
462 ``beautifulsoup4`` can use ``html5lib`` as a parser instead. Note that | |
463 since it doesn't support namespaces, foreign content like SVG and | |
464 MathML is parsed incorrectly. | |
465 | |
466 * Removed ``simpletree`` from the package. The default tree builder is | |
467 now ``etree`` (using the ``xml.etree.cElementTree`` implementation if | |
468 available, and ``xml.etree.ElementTree`` otherwise). | |
469 | |
470 * Removed the ``XHTMLSerializer`` as it never actually guaranteed its | |
471 output was well-formed XML, and hence provided little of use. | |
472 | |
473 * Removed default DOM treebuilder, so ``html5lib.treebuilders.dom`` is no | |
474 longer supported. ``html5lib.treebuilders.getTreeBuilder("dom")`` will | |
475 return the default DOM treebuilder, which uses ``xml.dom.minidom``. | |
476 | |
477 * Optional heuristic character encoding detection now based on | |
478 ``charade`` for Python 2.6 - 3.3 compatibility. | |
479 | |
480 * Optional ``Genshi`` treewalker support fixed. | |
481 | |
482 * Many bugfixes, including: | |
483 | |
484 * #33: null in attribute value breaks XML AttValue; | |
485 | |
486 * #4: nested, indirect descendant, <button> causes infinite loop; | |
487 | |
488 * `Google Code 215 | |
489 <http://code.google.com/p/html5lib/issues/detail?id=215>`_: Properly | |
490 detect seekable streams; | |
491 | |
492 * `Google Code 206 | |
493 <http://code.google.com/p/html5lib/issues/detail?id=206>`_: add | |
494 support for <video preload=...>, <audio preload=...>; | |
495 | |
496 * `Google Code 205 | |
497 <http://code.google.com/p/html5lib/issues/detail?id=205>`_: add | |
498 support for <video poster=...>; | |
499 | |
500 * `Google Code 202 | |
501 <http://code.google.com/p/html5lib/issues/detail?id=202>`_: Unicode | |
502 file breaks InputStream. | |
503 | |
504 * Source code is now mostly PEP 8 compliant. | |
505 | |
506 * Test harness has been improved and now depends on ``nose``. | |
507 | |
508 * Documentation updated and moved to https://html5lib.readthedocs.io/. | |
509 | |
510 | |
511 0.95 | |
512 ~~~~ | |
513 | |
514 Released on February 11, 2012 | |
515 | |
516 | |
517 0.90 | |
518 ~~~~ | |
519 | |
520 Released on January 17, 2010 | |
521 | |
522 | |
523 0.11.1 | |
524 ~~~~~~ | |
525 | |
526 Released on June 12, 2008 | |
527 | |
528 | |
529 0.11 | |
530 ~~~~ | |
531 | |
532 Released on June 10, 2008 | |
533 | |
534 | |
535 0.10 | |
536 ~~~~ | |
537 | |
538 Released on October 7, 2007 | |
539 | |
540 | |
541 0.9 | |
542 ~~~ | |
543 | |
544 Released on March 11, 2007 | |
545 | |
546 | |
547 0.2 | |
548 ~~~ | |
549 | |
550 Released on January 8, 2007 | |
551 | |
552 |