Commit 20d6cba3 authored by Stefano Rivera's avatar Stefano Rivera

New upstream version 4.6.3

parent 9786b08f
Beautiful Soup is made available under the MIT license:
Copyright (c) 2004-2016 Leonard Richardson
Copyright (c) 2004-2018 Leonard Richardson
Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
......
= 4.6.3 (20180812)
* Exactly the same as 4.6.2. Re-released to make the README file
render properly on PyPI.
= 4.6.2 (20180812)
* Fix an exception when a custom formatter was asked to format a void
element. [bug=1784408]
= 4.6.1 (20180728)
* Stop data loss when encountering an empty numeric entity, and
......
Metadata-Version: 1.1
Metadata-Version: 2.1
Name: beautifulsoup4
Version: 4.6.1
Version: 4.6.3
Summary: Screen-scraping library
Home-page: http://www.crummy.com/software/BeautifulSoup/bs4/
Author: Leonard Richardson
Author-email: leonardr@segfault.org
License: MIT
Download-URL: http://www.crummy.com/software/BeautifulSoup/bs4/download/
Description: Beautiful Soup sits atop an HTML or XML parser, providing Pythonic idioms for iterating, searching, and modifying the parse tree.
Description: Beautiful Soup is a library that makes it easy to scrape information
from web pages. It sits atop an HTML or XML parser, providing Pythonic
idioms for iterating, searching, and modifying the parse tree.
# Quick start
```
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup("<p>Some<b>bad<i>HTML")
>>> print soup.prettify()
<html>
<body>
<p>
Some
<b>
bad
<i>
HTML
</i>
</b>
</p>
</body>
</html>
>>> soup.find(text="bad")
u'bad'
>>> soup.i
<i>HTML</i>
>>> soup = BeautifulSoup("<tag1>Some<tag2/>bad<tag3>XML", "xml")
>>> print soup.prettify()
<?xml version="1.0" encoding="utf-8">
<tag1>
Some
<tag2 />
bad
<tag3>
XML
</tag3>
</tag1>
```
To go beyond the basics, [comprehensive documentation is available](http://www.crummy.com/software/BeautifulSoup/bs4/doc/).
# Links
* [Homepage](http://www.crummy.com/software/BeautifulSoup/bs4/)
* [Documentation](http://www.crummy.com/software/BeautifulSoup/bs4/doc/)
* [Discussion group](http://groups.google.com/group/beautifulsoup/)
* [Development](https://code.launchpad.net/beautifulsoup/)
* [Bug tracker](https://bugs.launchpad.net/beautifulsoup/)
* [Complete changelog](https://bazaar.launchpad.net/~leonardr/beautifulsoup/bs4/view/head:/NEWS.txt)
# Building the documentation
The bs4/doc/ directory contains full documentation in Sphinx
format. Run `make html` in that directory to create HTML
documentation.
# Running the unit tests
Beautiful Soup supports unit test discovery from the project root directory:
```
$ nosetests
```
```
$ python -m unittest discover -s bs4 # Python 2.7 and up
```
If you checked out the source tree, you should see a script in the
home directory called test-all-versions. This script will run the unit
tests under Python 2.7, then create a temporary Python 3 conversion of
the source and run the unit tests again under Python 3.
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
......@@ -19,3 +95,6 @@ Classifier: Topic :: Text Processing :: Markup :: HTML
Classifier: Topic :: Text Processing :: Markup :: XML
Classifier: Topic :: Text Processing :: Markup :: SGML
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Description-Content-Type: text/markdown
Provides-Extra: lxml
Provides-Extra: html5lib
= Introduction =
Beautiful Soup is a library that makes it easy to scrape information
from web pages. It sits atop an HTML or XML parser, providing Pythonic
idioms for iterating, searching, and modifying the parse tree.
# Quick start
```
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup("<p>Some<b>bad<i>HTML")
>>> print soup.prettify()
......@@ -33,31 +38,39 @@
XML
</tag3>
</tag1>
```
To go beyond the basics, [comprehensive documentation is available](http://www.crummy.com/software/BeautifulSoup/bs4/doc/).
# Links
= Full documentation =
* [Homepage](http://www.crummy.com/software/BeautifulSoup/bs4/)
* [Documentation](http://www.crummy.com/software/BeautifulSoup/bs4/doc/)
* [Discussion group](http://groups.google.com/group/beautifulsoup/)
* [Development](https://code.launchpad.net/beautifulsoup/)
* [Bug tracker](https://bugs.launchpad.net/beautifulsoup/)
* [Complete changelog](https://bazaar.launchpad.net/~leonardr/beautifulsoup/bs4/view/head:/NEWS.txt)
# Building the documentation
The bs4/doc/ directory contains full documentation in Sphinx
format. Run "make html" in that directory to create HTML
format. Run `make html` in that directory to create HTML
documentation.
= Running the unit tests =
# Running the unit tests
Beautiful Soup supports unit test discovery from the project root directory:
```
$ nosetests
```
```
$ python -m unittest discover -s bs4 # Python 2.7 and up
```
If you checked out the source tree, you should see a script in the
home directory called test-all-versions. This script will run the unit
tests under Python 2.7, then create a temporary Python 3 conversion of
the source and run the unit tests again under Python 3.
= Links =
Homepage: http://www.crummy.com/software/BeautifulSoup/bs4/
Documentation: http://www.crummy.com/software/BeautifulSoup/bs4/doc/
http://readthedocs.org/docs/beautiful-soup-4/
Discussion group: http://groups.google.com/group/beautifulsoup/
Development: https://code.launchpad.net/beautifulsoup/
Bug tracker: https://bugs.launchpad.net/beautifulsoup/
Metadata-Version: 1.1
Metadata-Version: 2.1
Name: beautifulsoup4
Version: 4.6.1
Version: 4.6.3
Summary: Screen-scraping library
Home-page: http://www.crummy.com/software/BeautifulSoup/bs4/
Author: Leonard Richardson
Author-email: leonardr@segfault.org
License: MIT
Download-URL: http://www.crummy.com/software/BeautifulSoup/bs4/download/
Description: Beautiful Soup sits atop an HTML or XML parser, providing Pythonic idioms for iterating, searching, and modifying the parse tree.
Description: Beautiful Soup is a library that makes it easy to scrape information
from web pages. It sits atop an HTML or XML parser, providing Pythonic
idioms for iterating, searching, and modifying the parse tree.
# Quick start
```
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup("<p>Some<b>bad<i>HTML")
>>> print soup.prettify()
<html>
<body>
<p>
Some
<b>
bad
<i>
HTML
</i>
</b>
</p>
</body>
</html>
>>> soup.find(text="bad")
u'bad'
>>> soup.i
<i>HTML</i>
>>> soup = BeautifulSoup("<tag1>Some<tag2/>bad<tag3>XML", "xml")
>>> print soup.prettify()
<?xml version="1.0" encoding="utf-8">
<tag1>
Some
<tag2 />
bad
<tag3>
XML
</tag3>
</tag1>
```
To go beyond the basics, [comprehensive documentation is available](http://www.crummy.com/software/BeautifulSoup/bs4/doc/).
# Links
* [Homepage](http://www.crummy.com/software/BeautifulSoup/bs4/)
* [Documentation](http://www.crummy.com/software/BeautifulSoup/bs4/doc/)
* [Discussion group](http://groups.google.com/group/beautifulsoup/)
* [Development](https://code.launchpad.net/beautifulsoup/)
* [Bug tracker](https://bugs.launchpad.net/beautifulsoup/)
* [Complete changelog](https://bazaar.launchpad.net/~leonardr/beautifulsoup/bs4/view/head:/NEWS.txt)
# Building the documentation
The bs4/doc/ directory contains full documentation in Sphinx
format. Run `make html` in that directory to create HTML
documentation.
# Running the unit tests
Beautiful Soup supports unit test discovery from the project root directory:
```
$ nosetests
```
```
$ python -m unittest discover -s bs4 # Python 2.7 and up
```
If you checked out the source tree, you should see a script in the
home directory called test-all-versions. This script will run the unit
tests under Python 2.7, then create a temporary Python 3 conversion of
the source and run the unit tests again under Python 3.
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
......@@ -19,3 +95,6 @@ Classifier: Topic :: Text Processing :: Markup :: HTML
Classifier: Topic :: Text Processing :: Markup :: XML
Classifier: Topic :: Text Processing :: Markup :: SGML
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Description-Content-Type: text/markdown
Provides-Extra: lxml
Provides-Extra: html5lib
......@@ -3,7 +3,7 @@ COPYING.txt
LICENSE
MANIFEST.in
NEWS.txt
README.txt
README.md
TODO.txt
convert-py3k
setup.cfg
......
......@@ -21,7 +21,7 @@ http://www.crummy.com/software/BeautifulSoup/bs4/doc/
# found in the LICENSE file.
__author__ = "Leonard Richardson (leonardr@segfault.org)"
__version__ = "4.6.1"
__version__ = "4.6.3"
__copyright__ = "Copyright (c) 2004-2018 Leonard Richardson"
__license__ = "MIT"
......
......@@ -239,6 +239,12 @@ class HTMLTreeBuilder(TreeBuilder):
# These are from earlier versions of HTML and are removed in HTML5.
'basefont', 'bgsound', 'command', 'frame', 'image', 'isindex', 'nextid', 'spacer'
])
# The HTML standard defines these as block-level elements. Beautiful
# Soup does not treat these elements differently from other elements,
# but it may do so eventually, and this information is available if
# you need to use it.
block_elements = set(["address", "article", "aside", "blockquote", "canvas", "dd", "div", "dl", "dt", "fieldset", "figcaption", "figure", "footer", "form", "h1", "h2", "h3", "h4", "h5", "h6", "header", "hr", "li", "main", "nav", "noscript", "ol", "output", "p", "pre", "section", "table", "tfoot", "ul", "video"])
# The HTML standard defines these attributes as containing a
# space-separated list of values, not a single value. That is,
......
......@@ -1223,7 +1223,9 @@ class Tag(PageElement):
prefix = self.prefix + ":"
if self.is_empty_element:
close = formatter.void_element_close_prefix or ''
close = ''
if isinstance(formatter, Formatter):
close = formatter.void_element_close_prefix or close
else:
closeTag = '</%s%s>' % (prefix, self.name)
......
......@@ -1474,14 +1474,14 @@ class TestSubstitutions(SoupTest):
self.document_for(u"<b><<Sacr\N{LATIN SMALL LETTER E WITH ACUTE} bleu!>></b>"))
def test_formatter_custom(self):
markup = u"<b>&lt;foo&gt;</b><b>bar</b>"
markup = u"<b>&lt;foo&gt;</b><b>bar</b><br/>"
soup = self.soup(markup)
decoded = soup.decode(formatter = lambda x: x.upper())
# Instead of normal entity conversion code, the custom
# callable is called on every string.
self.assertEqual(
decoded,
self.document_for(u"<b><FOO></b><b>BAR</b>"))
self.document_for(u"<b><FOO></b><b>BAR</b><br>"))
def test_formatter_is_run_on_attribute_values(self):
markup = u'<a href="http://a.com?a=b&c=é">e</a>'
......
[egg_info]
tag_build =
tag_date = 0
tag_svn_revision = 0
......@@ -3,15 +3,19 @@ from setuptools import (
find_packages,
)
with open("README.md", "r") as fh:
long_description = fh.read()
setup(
name="beautifulsoup4",
version = "4.6.1",
version = "4.6.3",
author="Leonard Richardson",
author_email='leonardr@segfault.org',
url="http://www.crummy.com/software/BeautifulSoup/bs4/",
download_url = "http://www.crummy.com/software/BeautifulSoup/bs4/download/",
description="Screen-scraping library",
long_description="""Beautiful Soup sits atop an HTML or XML parser, providing Pythonic idioms for iterating, searching, and modifying the parse tree.""",
long_description=long_description,
long_description_content_type="text/markdown",
license="MIT",
packages=find_packages(exclude=['tests*']),
extras_require = {
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment