Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

lxml Element boolean check

Tags:

python

lxml

This code:

from lxml.html import fromstring, tostring

s = '<span class="left">Whatever</span>'
e = fromstring(s)
print(tostring(e))
print(bool(e))

outputs:

<span class="left">Whatever</span>
False

Why? How boolean check working in this class? Point me on relevant documentation or code please.

ps
Im using lxml 3.3.5

like image 473
Gill Bates Avatar asked Jul 09 '14 18:07

Gill Bates


People also ask

What is lxml in Python?

lxml is a Python library which allows for easy handling of XML and HTML files, and can also be used for web scraping. There are a lot of off-the-shelf XML parsers out there, but for better results, developers sometimes prefer to write their own XML and HTML parsers. This is when the lxml library comes to play.

Where can I find all available functions of the lxml module?

You may check out the related API usage on the sidebar. You may also want to check out all available functions/classes of the module lxml.html , or try the search function .

What is the difference between XML and HTML Boolean attributes?

In XML, attributes must have at least the empty string as their value like <form novalidate=""></form>, but HTML boolean attributes can also be just present or absent from an element without having a value. One of the interesting modules in the lxml.html package deals with doctests.

How do I serialize a lxml element?

In lxml.etree, elements provide further iterators for all directions in the tree: children, parents (or rather ancestors) and siblings. Serialisation commonly uses the tostring () function that returns a string, or the ElementTree.write () method that writes to a file, a file-like object, or a URL (via FTP PUT or HTTP POST).


2 Answers

The relevant place in the Python documentation: https://docs.python.org/2/library/stdtypes.html#truth-value-testing

The ”truthiness” of an object is determined by either the __nonzero__() method or if that does not exist the result of the __len__() method. As your element has no child elements, i.e. its length is 0, it is considered False as a truth value.

like image 146
BlackJack Avatar answered Oct 03 '22 22:10

BlackJack


XML and HTML don't map cleanly to native python data structures. There is no unambiguous method to decide whether an element object should equate to True or False.

If you want to know if you've failed to acquire an element, compare with None. E.g.:

element is None

If you want to know whether your element has any child nodes, use len. E.g.:

len(element) > 0
like image 28
MattH Avatar answered Oct 03 '22 21:10

MattH