comparison env/lib/python3.9/site-packages/bleach/_vendor/html5lib-1.1.dist-info/METADATA @ 0:4f3585e2f14b draft default tip

"planemo upload commit 60cee0fc7c0cda8592644e1aad72851dec82c959"
author shellac
date Mon, 22 Mar 2021 18:12:50 +0000
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:4f3585e2f14b
1 Metadata-Version: 2.1
2 Name: html5lib
3 Version: 1.1
4 Summary: HTML parser based on the WHATWG HTML specification
5 Home-page: https://github.com/html5lib/html5lib-python
6 Maintainer: James Graham
7 Maintainer-email: james@hoppipolla.co.uk
8 License: MIT License
9 Platform: UNKNOWN
10 Classifier: Development Status :: 5 - Production/Stable
11 Classifier: Intended Audience :: Developers
12 Classifier: License :: OSI Approved :: MIT License
13 Classifier: Operating System :: OS Independent
14 Classifier: Programming Language :: Python
15 Classifier: Programming Language :: Python :: 2
16 Classifier: Programming Language :: Python :: 2.7
17 Classifier: Programming Language :: Python :: 3
18 Classifier: Programming Language :: Python :: 3.5
19 Classifier: Programming Language :: Python :: 3.6
20 Classifier: Programming Language :: Python :: 3.7
21 Classifier: Programming Language :: Python :: 3.8
22 Classifier: Programming Language :: Python :: Implementation :: CPython
23 Classifier: Programming Language :: Python :: Implementation :: PyPy
24 Classifier: Topic :: Software Development :: Libraries :: Python Modules
25 Classifier: Topic :: Text Processing :: Markup :: HTML
26 Requires-Python: >=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*
27 Requires-Dist: six (>=1.9)
28 Requires-Dist: webencodings
29 Provides-Extra: all
30 Requires-Dist: genshi ; extra == 'all'
31 Requires-Dist: chardet (>=2.2) ; extra == 'all'
32 Requires-Dist: lxml ; (platform_python_implementation == 'CPython') and extra == 'all'
33 Provides-Extra: chardet
34 Requires-Dist: chardet (>=2.2) ; extra == 'chardet'
35 Provides-Extra: genshi
36 Requires-Dist: genshi ; extra == 'genshi'
37 Provides-Extra: lxml
38 Requires-Dist: lxml ; (platform_python_implementation == 'CPython') and extra == 'lxml'
39
40 html5lib
41 ========
42
43 .. image:: https://travis-ci.org/html5lib/html5lib-python.svg?branch=master
44 :target: https://travis-ci.org/html5lib/html5lib-python
45
46
47 html5lib is a pure-python library for parsing HTML. It is designed to
48 conform to the WHATWG HTML specification, as is implemented by all major
49 web browsers.
50
51
52 Usage
53 -----
54
55 Simple usage follows this pattern:
56
57 .. code-block:: python
58
59 import html5lib
60 with open("mydocument.html", "rb") as f:
61 document = html5lib.parse(f)
62
63 or:
64
65 .. code-block:: python
66
67 import html5lib
68 document = html5lib.parse("<p>Hello World!")
69
70 By default, the ``document`` will be an ``xml.etree`` element instance.
71 Whenever possible, html5lib chooses the accelerated ``ElementTree``
72 implementation (i.e. ``xml.etree.cElementTree`` on Python 2.x).
73
74 Two other tree types are supported: ``xml.dom.minidom`` and
75 ``lxml.etree``. To use an alternative format, specify the name of
76 a treebuilder:
77
78 .. code-block:: python
79
80 import html5lib
81 with open("mydocument.html", "rb") as f:
82 lxml_etree_document = html5lib.parse(f, treebuilder="lxml")
83
84 When using with ``urllib2`` (Python 2), the charset from HTTP should be
85 pass into html5lib as follows:
86
87 .. code-block:: python
88
89 from contextlib import closing
90 from urllib2 import urlopen
91 import html5lib
92
93 with closing(urlopen("http://example.com/")) as f:
94 document = html5lib.parse(f, transport_encoding=f.info().getparam("charset"))
95
96 When using with ``urllib.request`` (Python 3), the charset from HTTP
97 should be pass into html5lib as follows:
98
99 .. code-block:: python
100
101 from urllib.request import urlopen
102 import html5lib
103
104 with urlopen("http://example.com/") as f:
105 document = html5lib.parse(f, transport_encoding=f.info().get_content_charset())
106
107 To have more control over the parser, create a parser object explicitly.
108 For instance, to make the parser raise exceptions on parse errors, use:
109
110 .. code-block:: python
111
112 import html5lib
113 with open("mydocument.html", "rb") as f:
114 parser = html5lib.HTMLParser(strict=True)
115 document = parser.parse(f)
116
117 When you're instantiating parser objects explicitly, pass a treebuilder
118 class as the ``tree`` keyword argument to use an alternative document
119 format:
120
121 .. code-block:: python
122
123 import html5lib
124 parser = html5lib.HTMLParser(tree=html5lib.getTreeBuilder("dom"))
125 minidom_document = parser.parse("<p>Hello World!")
126
127 More documentation is available at https://html5lib.readthedocs.io/.
128
129
130 Installation
131 ------------
132
133 html5lib works on CPython 2.7+, CPython 3.5+ and PyPy. To install:
134
135 .. code-block:: bash
136
137 $ pip install html5lib
138
139 The goal is to support a (non-strict) superset of the versions that `pip
140 supports
141 <https://pip.pypa.io/en/stable/installing/#python-and-os-compatibility>`_.
142
143 Optional Dependencies
144 ---------------------
145
146 The following third-party libraries may be used for additional
147 functionality:
148
149 - ``lxml`` is supported as a tree format (for both building and
150 walking) under CPython (but *not* PyPy where it is known to cause
151 segfaults);
152
153 - ``genshi`` has a treewalker (but not builder); and
154
155 - ``chardet`` can be used as a fallback when character encoding cannot
156 be determined.
157
158
159 Bugs
160 ----
161
162 Please report any bugs on the `issue tracker
163 <https://github.com/html5lib/html5lib-python/issues>`_.
164
165
166 Tests
167 -----
168
169 Unit tests require the ``pytest`` and ``mock`` libraries and can be
170 run using the ``py.test`` command in the root directory.
171
172 Test data are contained in a separate `html5lib-tests
173 <https://github.com/html5lib/html5lib-tests>`_ repository and included
174 as a submodule, thus for git checkouts they must be initialized::
175
176 $ git submodule init
177 $ git submodule update
178
179 If you have all compatible Python implementations available on your
180 system, you can run tests on all of them using the ``tox`` utility,
181 which can be found on PyPI.
182
183
184 Questions?
185 ----------
186
187 There's a mailing list available for support on Google Groups,
188 `html5lib-discuss <http://groups.google.com/group/html5lib-discuss>`_,
189 though you may get a quicker response asking on IRC in `#whatwg on
190 irc.freenode.net <http://wiki.whatwg.org/wiki/IRC>`_.
191
192 Change Log
193 ----------
194
195 1.1
196 ~~~
197
198 UNRELEASED
199
200 Breaking changes:
201
202 * Drop support for Python 3.3. (#358)
203 * Drop support for Python 3.4. (#421)
204
205 Deprecations:
206
207 * Deprecate the ``html5lib`` sanitizer (``html5lib.serialize(sanitize=True)`` and
208 ``html5lib.filters.sanitizer``). We recommend users migrate to `Bleach
209 <https://github.com/mozilla/bleach>`. Please let us know if Bleach doesn't suffice for your
210 use. (#443)
211
212 Other changes:
213
214 * Try to import from ``collections.abc`` to remove DeprecationWarning and ensure
215 ``html5lib`` keeps working in future Python versions. (#403)
216 * Drop optional ``datrie`` dependency. (#442)
217
218
219 1.0.1
220 ~~~~~
221
222 Released on December 7, 2017
223
224 Breaking changes:
225
226 * Drop support for Python 2.6. (#330) (Thank you, Hugo, Will Kahn-Greene!)
227 * Remove ``utils/spider.py`` (#353) (Thank you, Jon Dufresne!)
228
229 Features:
230
231 * Improve documentation. (#300, #307) (Thank you, Jon Dufresne, Tom Most,
232 Will Kahn-Greene!)
233 * Add iframe seamless boolean attribute. (Thank you, Ritwik Gupta!)
234 * Add itemscope as a boolean attribute. (#194) (Thank you, Jonathan Vanasco!)
235 * Support Python 3.6. (#333) (Thank you, Jon Dufresne!)
236 * Add CI support for Windows using AppVeyor. (Thank you, John Vandenberg!)
237 * Improve testing and CI and add code coverage (#323, #334), (Thank you, Jon
238 Dufresne, John Vandenberg, Sam Sneddon, Will Kahn-Greene!)
239 * Semver-compliant version number.
240
241 Bug fixes:
242
243 * Add support for setuptools < 18.5 to support environment markers. (Thank you,
244 John Vandenberg!)
245 * Add explicit dependency for six >= 1.9. (Thank you, Eric Amorde!)
246 * Fix regexes to work with Python 3.7 regex adjustments. (#318, #379) (Thank
247 you, Benedikt Morbach, Ville Skyttä, Mark Vasilkov!)
248 * Fix alphabeticalattributes filter namespace bug. (#324) (Thank you, Will
249 Kahn-Greene!)
250 * Include license file in generated wheel package. (#350) (Thank you, Jon
251 Dufresne!)
252 * Fix annotation-xml typo. (#339) (Thank you, Will Kahn-Greene!)
253 * Allow uppercase hex chararcters in CSS colour check. (#377) (Thank you,
254 Komal Dembla, Hugo!)
255
256
257 1.0
258 ~~~
259
260 Released and unreleased on December 7, 2017. Badly packaged release.
261
262
263 0.999999999/1.0b10
264 ~~~~~~~~~~~~~~~~~~
265
266 Released on July 15, 2016
267
268 * Fix attribute order going to the tree builder to be document order
269 instead of reverse document order(!).
270
271
272 0.99999999/1.0b9
273 ~~~~~~~~~~~~~~~~
274
275 Released on July 14, 2016
276
277 * **Added ordereddict as a mandatory dependency on Python 2.6.**
278
279 * Added ``lxml``, ``genshi``, ``datrie``, ``charade``, and ``all``
280 extras that will do the right thing based on the specific
281 interpreter implementation.
282
283 * Now requires the ``mock`` package for the testsuite.
284
285 * Cease supporting DATrie under PyPy.
286
287 * **Remove PullDOM support, as this hasn't ever been properly
288 tested, doesn't entirely work, and as far as I can tell is
289 completely unused by anyone.**
290
291 * Move testsuite to ``py.test``.
292
293 * **Fix #124: move to webencodings for decoding the input byte stream;
294 this makes html5lib compliant with the Encoding Standard, and
295 introduces a required dependency on webencodings.**
296
297 * **Cease supporting Python 3.2 (in both CPython and PyPy forms).**
298
299 * **Fix comments containing double-dash with lxml 3.5 and above.**
300
301 * **Use scripting disabled by default (as we don't implement
302 scripting).**
303
304 * **Fix #11, avoiding the XSS bug potentially caused by serializer
305 allowing attribute values to be escaped out of in old browser versions,
306 changing the quote_attr_values option on serializer to take one of
307 three values, "always" (the old True value), "legacy" (the new option,
308 and the new default), and "spec" (the old False value, and the old
309 default).**
310
311 * **Fix #72 by rewriting the sanitizer to apply only to treewalkers
312 (instead of the tokenizer); as such, this will require amending all
313 callers of it to use it via the treewalker API.**
314
315 * **Drop support of charade, now that chardet is supported once more.**
316
317 * **Replace the charset keyword argument on parse and related methods
318 with a set of keyword arguments: override_encoding, transport_encoding,
319 same_origin_parent_encoding, likely_encoding, and default_encoding.**
320
321 * **Move filters._base, treebuilder._base, and treewalkers._base to .base
322 to clarify their status as public.**
323
324 * **Get rid of the sanitizer package. Merge sanitizer.sanitize into the
325 sanitizer.htmlsanitizer module and move that to sanitizer. This means
326 anyone who used sanitizer.sanitize or sanitizer.HTMLSanitizer needs no
327 code changes.**
328
329 * **Rename treewalkers.lxmletree to .etree_lxml and
330 treewalkers.genshistream to .genshi to have a consistent API.**
331
332 * Move a whole load of stuff (inputstream, ihatexml, trie, tokenizer,
333 utils) to be underscore prefixed to clarify their status as private.
334
335
336 0.9999999/1.0b8
337 ~~~~~~~~~~~~~~~
338
339 Released on September 10, 2015
340
341 * Fix #195: fix the sanitizer to drop broken URLs (it threw an
342 exception between 0.9999 and 0.999999).
343
344
345 0.999999/1.0b7
346 ~~~~~~~~~~~~~~
347
348 Released on July 7, 2015
349
350 * Fix #189: fix the sanitizer to allow relative URLs again (as it did
351 prior to 0.9999/1.0b5).
352
353
354 0.99999/1.0b6
355 ~~~~~~~~~~~~~
356
357 Released on April 30, 2015
358
359 * Fix #188: fix the sanitizer to not throw an exception when sanitizing
360 bogus data URLs.
361
362
363 0.9999/1.0b5
364 ~~~~~~~~~~~~
365
366 Released on April 29, 2015
367
368 * Fix #153: Sanitizer fails to treat some attributes as URLs. Despite how
369 this sounds, this has no known security implications. No known version
370 of IE (5.5 to current), Firefox (3 to current), Safari (6 to current),
371 Chrome (1 to current), or Opera (12 to current) will run any script
372 provided in these attributes.
373
374 * Pass error message to the ParseError exception in strict parsing mode.
375
376 * Allow data URIs in the sanitizer, with a whitelist of content-types.
377
378 * Add support for Python implementations that don't support lone
379 surrogates (read: Jython). Fixes #2.
380
381 * Remove localization of error messages. This functionality was totally
382 unused (and untested that everything was localizable), so we may as
383 well follow numerous browsers in not supporting translating technical
384 strings.
385
386 * Expose treewalkers.pprint as a public API.
387
388 * Add a documentEncoding property to HTML5Parser, fix #121.
389
390
391 0.999
392 ~~~~~
393
394 Released on December 23, 2013
395
396 * Fix #127: add work-around for CPython issue #20007: .read(0) on
397 http.client.HTTPResponse drops the rest of the content.
398
399 * Fix #115: lxml treewalker can now deal with fragments containing, at
400 their root level, text nodes with non-ASCII characters on Python 2.
401
402
403 0.99
404 ~~~~
405
406 Released on September 10, 2013
407
408 * No library changes from 1.0b3; released as 0.99 as pip has changed
409 behaviour from 1.4 to avoid installing pre-release versions per
410 PEP 440.
411
412
413 1.0b3
414 ~~~~~
415
416 Released on July 24, 2013
417
418 * Removed ``RecursiveTreeWalker`` from ``treewalkers._base``. Any
419 implementation using it should be moved to
420 ``NonRecursiveTreeWalker``, as everything bundled with html5lib has
421 for years.
422
423 * Fix #67 so that ``BufferedStream`` to correctly returns a bytes
424 object, thereby fixing any case where html5lib is passed a
425 non-seekable RawIOBase-like object.
426
427
428 1.0b2
429 ~~~~~
430
431 Released on June 27, 2013
432
433 * Removed reordering of attributes within the serializer. There is now
434 an ``alphabetical_attributes`` option which preserves the previous
435 behaviour through a new filter. This allows attribute order to be
436 preserved through html5lib if the tree builder preserves order.
437
438 * Removed ``dom2sax`` from DOM treebuilders. It has been replaced by
439 ``treeadapters.sax.to_sax`` which is generic and supports any
440 treewalker; it also resolves all known bugs with ``dom2sax``.
441
442 * Fix treewalker assertions on hitting bytes strings on
443 Python 2. Previous to 1.0b1, treewalkers coped with mixed
444 bytes/unicode data on Python 2; this reintroduces this prior
445 behaviour on Python 2. Behaviour is unchanged on Python 3.
446
447
448 1.0b1
449 ~~~~~
450
451 Released on May 17, 2013
452
453 * Implementation updated to implement the `HTML specification
454 <http://www.whatwg.org/specs/web-apps/current-work/>`_ as of 5th May
455 2013 (`SVN <http://svn.whatwg.org/webapps/>`_ revision r7867).
456
457 * Python 3.2+ supported in a single codebase using the ``six`` library.
458
459 * Removed support for Python 2.5 and older.
460
461 * Removed the deprecated Beautiful Soup 3 treebuilder.
462 ``beautifulsoup4`` can use ``html5lib`` as a parser instead. Note that
463 since it doesn't support namespaces, foreign content like SVG and
464 MathML is parsed incorrectly.
465
466 * Removed ``simpletree`` from the package. The default tree builder is
467 now ``etree`` (using the ``xml.etree.cElementTree`` implementation if
468 available, and ``xml.etree.ElementTree`` otherwise).
469
470 * Removed the ``XHTMLSerializer`` as it never actually guaranteed its
471 output was well-formed XML, and hence provided little of use.
472
473 * Removed default DOM treebuilder, so ``html5lib.treebuilders.dom`` is no
474 longer supported. ``html5lib.treebuilders.getTreeBuilder("dom")`` will
475 return the default DOM treebuilder, which uses ``xml.dom.minidom``.
476
477 * Optional heuristic character encoding detection now based on
478 ``charade`` for Python 2.6 - 3.3 compatibility.
479
480 * Optional ``Genshi`` treewalker support fixed.
481
482 * Many bugfixes, including:
483
484 * #33: null in attribute value breaks XML AttValue;
485
486 * #4: nested, indirect descendant, <button> causes infinite loop;
487
488 * `Google Code 215
489 <http://code.google.com/p/html5lib/issues/detail?id=215>`_: Properly
490 detect seekable streams;
491
492 * `Google Code 206
493 <http://code.google.com/p/html5lib/issues/detail?id=206>`_: add
494 support for <video preload=...>, <audio preload=...>;
495
496 * `Google Code 205
497 <http://code.google.com/p/html5lib/issues/detail?id=205>`_: add
498 support for <video poster=...>;
499
500 * `Google Code 202
501 <http://code.google.com/p/html5lib/issues/detail?id=202>`_: Unicode
502 file breaks InputStream.
503
504 * Source code is now mostly PEP 8 compliant.
505
506 * Test harness has been improved and now depends on ``nose``.
507
508 * Documentation updated and moved to https://html5lib.readthedocs.io/.
509
510
511 0.95
512 ~~~~
513
514 Released on February 11, 2012
515
516
517 0.90
518 ~~~~
519
520 Released on January 17, 2010
521
522
523 0.11.1
524 ~~~~~~
525
526 Released on June 12, 2008
527
528
529 0.11
530 ~~~~
531
532 Released on June 10, 2008
533
534
535 0.10
536 ~~~~
537
538 Released on October 7, 2007
539
540
541 0.9
542 ~~~
543
544 Released on March 11, 2007
545
546
547 0.2
548 ~~~
549
550 Released on January 8, 2007
551
552