Commit 16c1c4fe authored by SVN-Git Migration's avatar SVN-Git Migration

Imported Upstream version 4.0.0~b8

parent 2fd0ee03
= 4.0.0b8 (20110224) =
* All tree builders now preserve namespace information in the
documents they parse. If you use the html5lib parser or lxml's XML
parser, you can access the namespace URL for a tag as tag.namespace.
However, there is no special support for namespace-oriented
searching or tree manipulation. When you search the tree, you need
to use namespace prefixes exactly as they're used in the original
document.
* The string representation of a DOCTYPE always ends in a newline.
* Issue a warning if the user tries to use a SoupStrainer in
conjunction with the html5lib tree builder, which doesn't support
them.
= 4.0.0b7 (20110223) =
* Upon decoding to string, any characters that can't be represented in
your chosen encoding will be converted into numeric XML entity
references.
* Issue a warning if characters were replaced with REPLACEMENT
CHARACTER during Unicode conversion.
* Restored compatibility with Python 2.6.
* The install process no longer installs docs or auxillary text files.
* It's now possible to deepcopy a BeautifulSoup object created with
Python's built-in HTML parser.
* About 100 unit tests that "test" the behavior of various parsers on
invalid markup have been removed. Legitimate changes to those
parsers caused these tests to fail, indicating that perhaps
Beautiful Soup should not test the behavior of foreign
libraries.
The problematic unit tests have been reformulated as informational
comparisons generated by the script
scripts/demonstrate_parser_differences.py.
This makes Beautiful Soup compatible with html5lib version 0.95 and
future versions of HTMLParser.
= 4.0.0b6 (20110216) =
* Multi-valued attributes like "class" always have a list of values,
even if there's only one value in the list.
* Added a number of multi-valued attributes defined in HTML5.
* Stopped generating a space before the slash that closes an
empty-element tag. This may come back if I add a special XHTML mode
(http://www.w3.org/TR/xhtml1/#C_2), but right now it's pretty
useless.
* Passing text along with tag-specific arguments to a find* method:
find("a", text="Click here")
will find tags that contain the given text as their
.string. Previously, the tag-specific arguments were ignored and
only strings were searched.
* Fixed a bug that caused the html5lib tree builder to build a
partially disconnected tree. Generally cleaned up the html5lib tree
builder.
* If you restrict a multi-valued attribute like "class" to a string
that contains spaces, Beautiful Soup will only consider it a match
if the values correspond to that specific string.
= 4.0.0b5 (20120209) =
* Rationalized Beautiful Soup's treatment of CSS class. A tag
......
Metadata-Version: 1.0
Name: beautifulsoup4
Version: 4.0.0b5
Version: 4.0.0b8
Summary: UNKNOWN
Home-page: http://www.crummy.com/software/BeautifulSoup/bs4/
Author: Leonard Richardson
......
......@@ -42,10 +42,11 @@ documentation.
= Running the unit tests =
Beautiful Soup supports unit test discovery. You can run the tests
from the project root directory with this command:
Beautiful Soup supports unit test discovery from the project root directory:
$ python -m unittest discover -s bs4
$ nosetests
$ python -m unittest discover -s bs4 # Python 2.7 and up
If you checked out the source tree, you should see a script in the
home directory called test-all-versions. This script will run the unit
......
Optimizations
-------------
The html5lib tree builder doesn't use the standard tree-building API,
which worries me. (This may also be why the tree builder doesn't
support SoupStrainers, but I think that has more to do with the fact
that the html5lib tree builder is constantly rearranging the tree, and
will crash if something it parsed earlier didn't actually make it into
the tree.)
markup_attr_map can be optimized since it's always a map now.
CDATA
-----
The elementtree XMLParser has a strip_cdata argument that, when set to
False, should allow Beautiful Soup to preserve CDATA sections instead
of treating them as text. Except it doesn't. (This argument is also
present for HTMLParser, and also does nothing there.)
Currently, htm5lib converts CDATA sections into comments. An
as-yet-unreleased version of html5lib changes the parser's handling of
CDATA sections to allow CDATA sections in tags like <svg> and
<math>. The HTML5TreeBuilder will need to be updated to create CData
objects instead of Comment objects in this situation.
from bs4 import BeautifulSoup
BeautifulSoup(open("838800.html"), "html5lib")
......@@ -3,22 +3,21 @@ Elixir and Tonic
"The Screen-Scraper's Friend"
http://www.crummy.com/software/BeautifulSoup/
Beautiful Soup uses a plug-in parser to parse a (possibly invalid) XML
or HTML document into a tree representation. The parser does the work
of building a parse tree, and Beautiful Soup provides provides methods
and Pythonic idioms that make it easy to navigate, search, and modify
the parse tree.
Beautiful Soup uses a pluggable XML or HTML parser to parse a
(possibly invalid) document into a tree representation. Beautiful Soup
provides provides methods and Pythonic idioms that make it easy to
navigate, search, and modify the parse tree.
Beautiful Soup works with Python 2.6 and up. It works better if lxml
or html5lib is installed.
and/or html5lib is installed.
For more than you ever wanted to know about Beautiful Soup, see the
documentation:
http://www.crummy.com/software/BeautifulSoup/documentation.html
http://www.crummy.com/software/BeautifulSoup/bs4/doc/
"""
__author__ = "Leonard Richardson (leonardr@segfault.org)"
__version__ = "4.0.0b5"
__version__ = "4.0.0b8"
__copyright__ = "Copyright (c) 2004-2012 Leonard Richardson"
__license__ = "MIT"
......@@ -194,9 +193,9 @@ class BeautifulSoup(Tag):
self.tagStack = []
self.pushTag(self)
def new_tag(self, name, **attrs):
def new_tag(self, name, namespace=None, nsprefix=None, **attrs):
"""Create a new tag associated with this soup."""
return Tag(None, self.builder, name, attrs)
return Tag(None, self.builder, name, namespace, nsprefix, attrs)
def new_string(self, s):
"""Create a new NavigableString associated with this soup."""
......@@ -250,7 +249,7 @@ class BeautifulSoup(Tag):
self.previous_element = o
self.currentTag.contents.append(o)
def _popToTag(self, name, inclusivePop=True):
def _popToTag(self, name, nsprefix=None, inclusivePop=True):
"""Pops the tag stack up to and including the most recent
instance of the given tag. If inclusivePop is false, pops the tag
stack up to but *not* including the most recent instqance of
......@@ -263,7 +262,8 @@ class BeautifulSoup(Tag):
mostRecentTag = None
for i in range(len(self.tagStack) - 1, 0, -1):
if name == self.tagStack[i].name:
if (name == self.tagStack[i].name
and nsprefix == self.tagStack[i].nsprefix == nsprefix):
numPops = len(self.tagStack) - i
break
if not inclusivePop:
......@@ -273,7 +273,7 @@ class BeautifulSoup(Tag):
mostRecentTag = self.popTag()
return mostRecentTag
def handle_starttag(self, name, attrs):
def handle_starttag(self, name, namespace, nsprefix, attrs):
"""Push a start tag on to the stack.
If this method returns None, the tag was rejected by the
......@@ -282,7 +282,7 @@ class BeautifulSoup(Tag):
don't call handle_endtag.
"""
#print "Start tag %s: %s" % (name, attrs)
# print "Start tag %s: %s" % (name, attrs)
self.endData()
if (self.parse_only and len(self.tagStack) <= 1
......@@ -290,8 +290,8 @@ class BeautifulSoup(Tag):
or not self.parse_only.search_tag(name, attrs))):
return None
tag = Tag(self, self.builder, name, attrs, self.currentTag,
self.previous_element)
tag = Tag(self, self.builder, name, namespace, nsprefix, attrs,
self.currentTag, self.previous_element)
if tag is None:
return tag
if self.previous_element:
......@@ -300,10 +300,10 @@ class BeautifulSoup(Tag):
self.pushTag(tag)
return tag
def handle_endtag(self, name):
def handle_endtag(self, name, nsprefix=None):
#print "End tag: " + name
self.endData()
self._popToTag(name)
self._popToTag(name, nsprefix)
def handle_data(self, data):
self.currentData.append(data)
......
......@@ -82,9 +82,9 @@ class TreeBuilder(object):
empty_element_tags = None # A tag will be considered an empty-element
# tag when and only when it has no contents.
# A value for these attributes is a space- or comma-separated list
# of CDATA, rather than a single CDATA.
cdata_list_attributes = None
# A value for these tag/attribute combinations is a space- or
# comma-separated list of CDATA, rather than a single CDATA.
cdata_list_attributes = {}
def __init__(self):
......@@ -201,8 +201,22 @@ class HTMLTreeBuilder(TreeBuilder):
# encounter one of these attributes, we will parse its value into
# a list of values if possible. Upon output, the list will be
# converted back into a string.
cdata_list_attributes = set(
['class', 'rel', 'rev', 'archive', 'accept-charset', 'headers'])
cdata_list_attributes = {
"*" : ['class', 'accesskey', 'dropzone'],
"a" : ['rel', 'rev'],
"link" : ['rel', 'rev'],
"td" : ["headers"],
"th" : ["headers"],
"td" : ["headers"],
"form" : ["accept-charset"],
"object" : ["archive"],
# These are HTML5 specific, as are *.accesskey and *.dropzone above.
"area" : ["rel"],
"icon" : ["sizes"],
"iframe" : ["sandbox"],
"output" : ["for"],
}
# Used by set_up_substitutions to detect the charset in a META tag
CHARSET_RE = re.compile("((^|;)\s*charset=)([^;]*)", re.M)
......
......@@ -2,18 +2,16 @@ __all__ = [
'HTML5TreeBuilder',
]
import warnings
from bs4.builder import (
PERMISSIVE,
HTML,
HTML_5,
HTMLTreeBuilder,
)
from bs4.element import NamespacedAttribute
import html5lib
from html5lib.constants import (
DataLossWarning,
namespaces,
)
import warnings
from html5lib.constants import namespaces
from bs4.element import (
Comment,
Doctype,
......@@ -33,6 +31,8 @@ class HTML5TreeBuilder(HTMLTreeBuilder):
# These methods are defined by Beautiful Soup.
def feed(self, markup):
if self.soup.parse_only is not None:
warnings.warn("You provided a value for parse_only, but the html5lib tree builder doesn't support parse_only. The entire document will be parsed.")
parser = html5lib.HTMLParser(tree=self.create_treebuilder)
doc = parser.parse(markup, encoding=self.user_specified_encoding)
......@@ -58,9 +58,6 @@ class TreeBuilderForHtml5lib(html5lib.treebuilders._base.TreeBuilder):
def __init__(self, soup, namespaceHTMLElements):
self.soup = soup
if namespaceHTMLElements:
warnings.warn("namespaceHTMLElements not supported yet",
DataLossWarning)
super(TreeBuilderForHtml5lib, self).__init__(namespaceHTMLElements)
def documentClass(self):
......@@ -76,9 +73,8 @@ class TreeBuilderForHtml5lib(html5lib.treebuilders._base.TreeBuilder):
self.soup.object_was_parsed(doctype)
def elementClass(self, name, namespace):
if namespace is not None:
warnings.warn("BeautifulSoup cannot represent elements in any namespace", DataLossWarning)
return Element(Tag(self.soup, self.soup.builder, name), self.soup, namespace)
tag = self.soup.new_tag(name, namespace)
return Element(tag, self.soup, namespace)
def commentClass(self, data):
return TextNode(Comment(data), self.soup)
......@@ -89,10 +85,8 @@ class TreeBuilderForHtml5lib(html5lib.treebuilders._base.TreeBuilder):
return Element(self.soup, self.soup, None)
def appendChild(self, node):
self.soup.insert(len(self.soup.contents), node.element)
def testSerializer(self, element):
return testSerializer(element)
# XXX This code is not covered by the BS4 tests.
self.soup.append(node.element)
def getDocument(self):
return self.soup
......@@ -126,31 +120,17 @@ class Element(html5lib.treebuilders._base.Node):
self.soup = soup
self.namespace = namespace
def _nodeIndex(self, node, refNode):
# Finds a node by identity rather than equality
for index in range(len(self.element.contents)):
if id(self.element.contents[index]) == id(refNode.element):
return index
return None
def appendChild(self, node):
if (node.element.__class__ == NavigableString and self.element.contents
and self.element.contents[-1].__class__ == NavigableString):
# Concatenate new text onto old text node
# (TODO: This has O(n^2) performance, for input like "a</a>a</a>a</a>...")
newStr = NavigableString(self.element.contents[-1]+node.element)
# Remove the old text node
# (Can't simply use .extract() by itself, because it fails if
# an equal text node exists within the parent node)
oldElement = self.element.contents[-1]
del self.element.contents[-1]
oldElement.parent = None
oldElement.extract()
self.element.insert(len(self.element.contents), newStr)
# XXX This has O(n^2) performance, for input like
# "a</a>a</a>a</a>..."
old_element = self.element.contents[-1]
new_element = self.soup.new_string(old_element + node.element)
old_element.replace_with(new_element)
else:
self.element.insert(len(self.element.contents), node.element)
self.element.append(node.element)
node.parent = self
def getAttributes(self):
......@@ -159,61 +139,55 @@ class Element(html5lib.treebuilders._base.Node):
def setAttributes(self, attributes):
if attributes is not None and attributes != {}:
for name, value in list(attributes.items()):
if isinstance(name, tuple):
name = NamespacedAttribute(*name)
self.element[name] = value
# The attributes may contain variables that need substitution.
# Call set_up_substitutions manually.
# The Tag constructor calls this method automatically,
# but html5lib creates a Tag object before setting up
# the attributes.
#
# The Tag constructor called this method when the Tag was created,
# but we just set/changed the attributes, so call it again.
self.element.contains_substitutions = (
self.soup.builder.set_up_substitutions(
self.element))
attributes = property(getAttributes, setAttributes)
def insertText(self, data, insertBefore=None):
text = TextNode(NavigableString(data), self.soup)
text = TextNode(self.soup.new_string(data), self.soup)
if insertBefore:
self.insertBefore(text, insertBefore)
else:
self.appendChild(text)
def insertBefore(self, node, refNode):
index = self._nodeIndex(node, refNode)
index = self.element.index(refNode.element)
if (node.element.__class__ == NavigableString and self.element.contents
and self.element.contents[index-1].__class__ == NavigableString):
# (See comments in appendChild)
newStr = NavigableString(self.element.contents[index-1]+node.element)
oldNode = self.element.contents[index-1]
del self.element.contents[index-1]
oldNode.parent = None
oldNode.extract()
self.element.insert(index-1, newStr)
old_node = self.element.contents[index-1]
new_str = self.soup.new_string(old_node + node.element)
old_node.replace_with(new_str)
else:
self.element.insert(index, node.element)
node.parent = self
def removeChild(self, node):
index = self._nodeIndex(node.parent, node)
# XXX This if statement is problematic:
# https://bugs.launchpad.net/beautifulsoup/+bug/838800
if index is not None:
del node.parent.element.contents[index]
node.element.parent = None
node.element.extract()
node.parent = None
def reparentChildren(self, newParent):
while self.element.contents:
child = self.element.contents[0]
child.extract()
if isinstance(child, Tag):
newParent.appendChild(Element(child, self.soup, namespaces["html"]))
newParent.appendChild(
Element(child, self.soup, namespaces["html"]))
else:
newParent.appendChild(TextNode(child, self.soup))
newParent.appendChild(
TextNode(child, self.soup))
def cloneNode(self):
node = Element(Tag(self.soup, self.soup.builder, self.element.name), self.soup, self.namespace)
tag = self.soup.new_tag(self.element.name, self.namespace)
node = Element(tag, self.soup, self.namespace)
for key,value in self.attributes:
node.attributes[key] = value
return node
......
......@@ -38,37 +38,10 @@ from bs4.builder import (
HTMLPARSER = 'html.parser'
class HTMLParserTreeBuilder(HTMLParser, HTMLTreeBuilder):
is_xml = False
features = [HTML, STRICT, HTMLPARSER]
def __init__(self, *args, **kwargs):
if CONSTRUCTOR_TAKES_STRICT:
kwargs['strict'] = False
return super(HTMLParserTreeBuilder, self).__init__(*args, **kwargs)
def prepare_markup(self, markup, user_specified_encoding=None,
document_declared_encoding=None):
"""
:return: A 4-tuple (markup, original encoding, encoding
declared within markup, whether any characters had to be
replaced with REPLACEMENT CHARACTER).
"""
if isinstance(markup, unicode):
return markup, None, None, False
try_encodings = [user_specified_encoding, document_declared_encoding]
dammit = UnicodeDammit(markup, try_encodings, is_html=True)
return (dammit.markup, dammit.original_encoding,
dammit.declared_html_encoding,
dammit.contains_replacement_characters)
def feed(self, markup):
super(HTMLParserTreeBuilder, self).feed(markup)
class BeautifulSoupHTMLParser(HTMLParser):
def handle_starttag(self, name, attrs):
self.soup.handle_starttag(name, dict(attrs))
# XXX namespace
self.soup.handle_starttag(name, None, None, dict(attrs))
def handle_endtag(self, name):
self.soup.handle_endtag(name)
......@@ -80,9 +53,15 @@ class HTMLParserTreeBuilder(HTMLParser, HTMLTreeBuilder):
# XXX workaround for a bug in HTMLParser. Remove this once
# it's fixed.
if name.startswith('x'):
data = unichr(int(name.lstrip('x'), 16))
real_name = int(name.lstrip('x'), 16)
else:
data = unichr(int(name))
real_name = int(name)
try:
data = unichr(real_name)
except (ValueError, OverflowError), e:
data = u"\N{REPLACEMENT CHARACTER}"
self.handle_data(data)
def handle_entityref(self, name):
......@@ -120,6 +99,40 @@ class HTMLParserTreeBuilder(HTMLParser, HTMLTreeBuilder):
self.soup.handle_data(data)
self.soup.endData(ProcessingInstruction)
class HTMLParserTreeBuilder(HTMLTreeBuilder):
is_xml = False
features = [HTML, STRICT, HTMLPARSER]
def __init__(self, *args, **kwargs):
if CONSTRUCTOR_TAKES_STRICT:
kwargs['strict'] = False
self.parser_args = (args, kwargs)
def prepare_markup(self, markup, user_specified_encoding=None,
document_declared_encoding=None):
"""
:return: A 4-tuple (markup, original encoding, encoding
declared within markup, whether any characters had to be
replaced with REPLACEMENT CHARACTER).
"""
if isinstance(markup, unicode):
return markup, None, None, False
try_encodings = [user_specified_encoding, document_declared_encoding]
dammit = UnicodeDammit(markup, try_encodings, is_html=True)
return (dammit.markup, dammit.original_encoding,
dammit.declared_html_encoding,
dammit.contains_replacement_characters)
def feed(self, markup):
args, kwargs = self.parser_args
parser = BeautifulSoupHTMLParser(*args, **kwargs)
parser.soup = self.soup
parser.feed(markup)
# Patch 3.2 versions of HTMLParser earlier than 3.2.3 to use some
# 3.2.3 code. This ensures they don't treat markup like <p></p> as a
# string.
......@@ -146,7 +159,7 @@ if major == 3 and minor == 2 and not CONSTRUCTOR_TAKES_STRICT:
)*
\s* # trailing whitespace
""", re.VERBOSE)
HTMLParserTreeBuilder.locatestarttagend = locatestarttagend
BeautifulSoupHTMLParser.locatestarttagend = locatestarttagend
from html.parser import tagfind, attrfind
......@@ -209,7 +222,7 @@ if major == 3 and minor == 2 and not CONSTRUCTOR_TAKES_STRICT:
self.cdata_elem = elem.lower()
self.interesting = re.compile(r'</\s*%s\s*>' % self.cdata_elem, re.I)
HTMLParserTreeBuilder.parse_starttag = parse_starttag
HTMLParserTreeBuilder.set_cdata_mode = set_cdata_mode
BeautifulSoupHTMLParser.parse_starttag = parse_starttag
BeautifulSoupHTMLParser.set_cdata_mode = set_cdata_mode
CONSTRUCTOR_TAKES_STRICT = True
......@@ -5,7 +5,7 @@ __all__ = [
import collections
from lxml import etree
from bs4.element import Comment, Doctype
from bs4.element import Comment, Doctype, NamespacedAttribute
from bs4.builder import (
FAST,
HTML,
......@@ -42,6 +42,15 @@ class LXMLTreeBuilderForXML(TreeBuilder):
parser = parser(target=self, strip_cdata=False)
self.parser = parser
self.soup = None
self.nsmaps = None
def _getNsTag(self, tag):
# Split the namespace URL out of a fully-qualified lxml tag
# name. Copied from lxml's src/lxml/sax.py.
if tag[0] == '{':
return tuple(tag[1:].split('}', 1))
else:
return (None, tag)
def prepare_markup(self, markup, user_specified_encoding=None,
document_declared_encoding=None):
......@@ -63,15 +72,56 @@ class LXMLTreeBuilderForXML(TreeBuilder):
self.parser.close()
def close(self):
pass
def start(self, name, attrs):
self.soup.handle_starttag(name, attrs)
self.nsmaps = None
def start(self, name, attrs, nsmap={}):
nsprefix = None
# Invert each namespace map as it comes in.
if len(nsmap) == 0 and self.nsmaps != None:
# There are no new namespaces for this tag, but namespaces
# are in play, so we need a separate tag stack to know
# when they end.
self.nsmaps.append(None)
elif len(nsmap) > 0:
# A new namespace mapping has come into play.
if self.nsmaps is None:
self.nsmaps = []
inverted_nsmap = dict((value, key) for key, value in nsmap.items())
self.nsmaps.append(inverted_nsmap)
# Also treat the namespace mapping as a set of attributes on the
# tag, so we can recreate it later.
attrs = attrs.copy()
for prefix, namespace in nsmap.items():
attribute = NamespacedAttribute(
"xmlns", prefix, "http://www.w3.org/2000/xmlns/")
attrs[attribute] = namespace
namespace, name = self._getNsTag(name)
if namespace is not None:
for inverted_nsmap in reversed(self.nsmaps):
if inverted_nsmap is not None and namespace in inverted_nsmap:
nsprefix = inverted_nsmap[namespace]
break
self.soup.handle_starttag(name, namespace, nsprefix, attrs)
def end(self, name):
self.soup.endData()
completed_tag = self.soup.tagStack[-1]
self.soup.handle_endtag(name)
namespace, name = self._getNsTag(name)
nsprefix = None
if namespace is not None:
for inverted_nsmap in reversed(self.nsmaps):
if inverted_nsmap is not None and namespace in inverted_nsmap:
nsprefix = inverted_nsmap[namespace]
break
self.soup.handle_endtag(name, nsprefix)
if self.nsmaps != None:
# This tag, or one of its parents, introduced a namespace
# mapping, so pop it off the stack.
self.nsmaps.pop()
if len(self.nsmaps) == 0:
# Namespaces are no longer in play, so don't bother keeping
# track of the namespace stack.
self.nsmaps = None
def pi(self, target, data):
pass
......
......@@ -9,6 +9,7 @@ encoding; that's the tree builder's job.
import codecs
from htmlentitydefs import codepoint2name
import re
import warnings
# Autodetects character encodings. Very useful.
# Download from http://chardet.feedparser.org/
......@@ -212,6 +213,10 @@ class UnicodeDammit:
if proposed_encoding != "ascii":
u = self._convert_from(proposed_encoding, "replace")
if u is not None:
warnings.warn(
UnicodeWarning(
"Some characters could not be decoded, and were "
"replaced with REPLACEMENT CHARACTER."))
self.contains_replacement_characters = True
break
......
import collections
import itertools
import re
import sys
import warnings
......@@ -21,6 +22,19 @@ def _alias(attr):
return alias
class NamespacedAttribute(unicode):
def __new__(cls, prefix, name, namespace=None):
if name is None:
obj = unicode.__new__(cls, prefix)
else:
obj = unicode.__new__(cls, prefix + ":" + name)
obj.prefix = prefix
obj.name = name
obj.namespace = namespace
return obj
class PageElement(object):
"""Contains the navigational information for some part of the page
(either a tag or a piece of text)"""
......@@ -499,15 +513,15 @@ class Doctype(NavigableString):
return Doctype(value)
PREFIX = u'<!DOCTYPE '
SUFFIX = u'>'
SUFFIX = u'>\n'
class Tag(PageElement):
"""Represents a found HTML tag with its attributes and contents."""
def __init__(self, parser=None, builder=None, name=None, attrs=None,
parent=None, previous=None):
def __init__(self, parser=None, builder=None, name=None, namespace=None,
nsprefix=None, attrs=None, parent=None, previous=None):
"Basic constructor."
if parser is None:
......@@ -519,20 +533,24 @@ class Tag(PageElement):
if name is None:
raise ValueError("No value provided for new tag's name.")
self.name = name
self.namespace = namespace
self.nsprefix = nsprefix
if attrs is None:
attrs = {}
else:
attrs = dict(attrs)
if builder.cdata_list_attributes:
for cdata_list_attr in builder.cdata_list_attributes:
universal = builder.cdata_list_attributes.get('*', [])
tag_specific = builder.cdata_list_attributes.get(
self.name.lower(), [])
for cdata_list_attr in itertools.chain(universal, tag_specific):
if cdata_list_attr in attrs:
# Basically, we have a "class" attribute whose