I want:
<div data-a>
But LXML API seems to give me only this:
<div data-a=''>
How do I get value-less attributes?
Its annoying that blank values and null values are represented by LXML as a blank string.
Setting None value does not help.
In [19]: from lxml.html import fromstring, tostring
In [20]: b = fromstring('<body class="meow" data-a="haha" data-b data-x="">text-fef27e87389e466fb99b5421629323f6</body>')
In [21]: b.attrib
Out[21]: {'data-a': 'haha', 'data-x': '', 'data-b': '', 'class': 'meow'}
In [22]: b = fromstring('<body class="meow" data-a="haha" data-b data-x="">text-fef27e87389e466fb99b5421629323f6</body>')
In [23]: b.attrib
Out[23]: {'data-a': 'haha', 'data-x': '', 'data-b': '', 'class': 'meow'}
In [24]: b.attrib['data-y'] = None
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-24-1f55133e3dc4> in <module>()
----> 1 b.attrib['data-y'] = None
/usr/lib/python2.7/dist-packages/lxml/etree.so in lxml.etree._Attrib.__setitem__ (src/lxml/lxml.etree.c:58775)()
/usr/lib/python2.7/dist-packages/lxml/etree.so in lxml.etree._setAttributeValue (src/lxml/lxml.etree.c:19025)()
/usr/lib/python2.7/dist-packages/lxml/etree.so in lxml.etree._utf8 (src/lxml/lxml.etree.c:26460)()
TypeError: Argument must be bytes or unicode, got 'NoneType'
tag.attrib['data-a'] = None
TypeError: Argument must be bytes or unicode, got 'NoneType'
An element with no content is said to be empty. The two forms produce identical results in XML software (Readers, Parsers, Browsers). Empty elements can have attributes.
lxml is one of the fastest and feature-rich libraries for processing XML and HTML in Python. This library is essentially a wrapper over C libraries libxml2 and libxslt. This combines the speed of the native C library and the simplicity of Python.
lxml is a Python library which allows for easy handling of XML and HTML files, and can also be used for web scraping. There are a lot of off-the-shelf XML parsers out there, but for better results, developers sometimes prefer to write their own XML and HTML parsers.
lxml provides a very simple and powerful API for parsing XML and HTML. It supports one-step parsing as well as step-by-step parsing using an event-driven API (currently only for XML).
IMHO, lxml
is demonstrating the expected behavior. Attribute without value makes non well-formed XML, and decent XML parser don't produce non well-formed XML :
Looks like you are actually trying to manipulate HTML and not XML. If that is true, then use lxml.html instead of lxml.etree.
You are trying to set a "boolean attribute" which is not to be confused with a "boolean value" (see boolean-attributes). As already stated in the other answer, the boolean attribute syntax is not allowed.e
However, since it seems obvious that you are trying to manipulate HTML, you create a boolean attribute with an HTML Element not an XML Element.
import unittest
import lxml.html
class HtmlBooleanAttribute(unittest.TestCase):
def test_booleanAttribute(self):
# !!! BE SURE TO CREATE AN ****HTML**** ELEMENT !!!
div = lxml.html.Element('div')
# Set a boolean attribute; omitting the value or providing None will
# create a boolean attribute.
div.set('data-a')
div.set('data-b', None)
# Setting the value to an empty will not give you a boolean attribute
div.set('data-c', '')
# Set a normal attribute for comparison
div.set('class','big red')
print
print lxml.html.tostring(div)
print
# Note that 'data-a' will be a zero-length string
print 'data-a = ', div.get('data-a')
print 'type(data-a) = ', type(div.get('data-a'))
print 'len(data-a) = ', len(div.get('data-a'))
print
print 'data-c = ', div.get('data-c')
print 'type(data-c) = ', type(div.get('data-c'))
print 'len(data-c) = ', len(div.get('data-c'))
if __name__ == "__main__":
#import sys;sys.argv = ['', 'Test.testName']
unittest.main()
Output
<div data-a data-b data-c="" class="big red"></div>
data-a =
type(data-a) = <type 'str'>
len(data-a) = 0
data-c =
type(data-c) = <type 'str'>
len(data-c) = 0
.
----------------------------------------------------------------------
Ran 1 test in 0.000s
OK
Note that data-a and data-b are both zero-length strings but they print differently.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With