Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the functional difference between `etree.fromstring()` and `etree.XML()` in lxml?

Tags:

python

lxml

lxml offers a few different functions to parse strings. Two of them, etree.fromstring() and etree.XML(), seem very similar. The docstring for the former says it's for parsing "strings", while the latter "string constants". Additionally, XML()'s docstring states:

This function can be used to embed "XML literals" in Python code, [...]

What's the functional difference between these functions? When should one be used over the other?

like image 297
outis Avatar asked Aug 06 '17 19:08

outis


People also ask

What is the use of lxml?

lxml is a Python library which allows for easy handling of XML and HTML files, and can also be used for web scraping. There are a lot of off-the-shelf XML parsers out there, but for better results, developers sometimes prefer to write their own XML and HTML parsers.

Is lxml a parser?

lxml provides a very simple and powerful API for parsing XML and HTML. It supports one-step parsing as well as step-by-step parsing using an event-driven API (currently only for XML).

Is lxml faster than BeautifulSoup?

It is not uncommon that lxml/libxml2 parses and fixes broken HTML better, but BeautifulSoup has superiour support for encoding detection. It very much depends on the input which parser works better. In the end they are saying, The downside of using this parser is that it is much slower than the HTML parser of lxml.


1 Answers

Looking at the source code, for XML() and fromstring(), the former has this extra snippet of code:

if parser is None:
    parser = __GLOBAL_PARSER_CONTEXT.getDefaultParser()
    if not isinstance(parser, XMLParser):
        parser = __DEFAULT_XML_PARSER

They thus differ in side effects: XML() only uses the default XML parser as the default parser. If the default parser were changed to a non-XMLParser, XML() will ignore it.

etree.set_default_parser(etree.HTMLParser())
etree.tostring(etree.fromstring("<root/>"))
# b'<html><body><root/></body></html>'
etree.tostring(etree.XML("<root/>"))
# b'<root/>'
like image 176
outis Avatar answered Sep 29 '22 12:09

outis