I'm completely stumped why lxml <code>.text</code> will give me the text for a child tag but for the root tag. <pre class="prettyprint"><code>some_tag = etree.fromstring('<some_tag class="abc">Hello World</some_tag>') some_tag.find("strong") Out[195]: <Element strong at 0x7427d00> some_tag.find("strong").text Out[196]: 'Hello' some_tag Out[197]: <Element some_tag at 0x7bee508> some_tag.text </code></pre> <code>some_tag.find("strong").text</code> returns the text between the <code></code> tag. I expect <code>some_tag.text</code> to return everything between <code><some_tag> ... </some_tag></code> Expected: <pre class="prettyprint"><code>Hello World </code></pre> Instead, it returns nothing.

<pre class="prettyprint"><code>from lxml import etree XML = '<some_tag class="abc">Hello World</some_tag>' some_tag = etree.fromstring(XML) for element in some_tag: print element.tag, element.text, element.tail </code></pre> Output: <pre class="prettyprint lang-none prettyprint-override"><code>strong Hello World </code></pre> For information on the <code>.text</code> and <code>.tail</code> properties, see: <ul> <li>http://lxml.de/tutorial.html#elements-contain-text</li> <li>http://infohost.nmt.edu/tcc/help/pubs/pylxml/web/etree-view.html</li> </ul> To get exactly the result that you expected, use <pre class="prettyprint"><code>print etree.tostring(some_tag.find("strong")) </code></pre> Output: <pre class="prettyprint"><code>Hello World </code></pre>

How to get text for a root element using lxml?

Tags:

python

lxml

I'm completely stumped why lxml .text will give me the text for a child tag but for the root tag.

some_tag = etree.fromstring('<some_tag class="abc"><strong>Hello</strong> World</some_tag>')

some_tag.find("strong")
Out[195]: <Element strong at 0x7427d00>

some_tag.find("strong").text
Out[196]: 'Hello'

some_tag
Out[197]: <Element some_tag at 0x7bee508>

some_tag.text

some_tag.find("strong").text returns the text between the  tag.

I expect some_tag.text to return everything between <some_tag> ... </some_tag>

Expected:

<strong>Hello</strong> World

Instead, it returns nothing.

596

asked Apr 21 '12 11:04

Jason Wirth

1 Answers

from lxml import etree

XML = '<some_tag class="abc"><strong>Hello</strong> World</some_tag>'

some_tag = etree.fromstring(XML)

for element in some_tag:
    print element.tag, element.text, element.tail

Output:

strong Hello  World

For information on the .text and .tail properties, see:

http://lxml.de/tutorial.html#elements-contain-text
http://infohost.nmt.edu/tcc/help/pubs/pylxml/web/etree-view.html

To get exactly the result that you expected, use

print etree.tostring(some_tag.find("strong"))

Output:

<strong>Hello</strong> World

102

answered Oct 05 '22 20:10

mzjn

Related questions
                            
                                django csrf_token not printing hidden input field
                            
                                Is there a way to configure a Python logging formatter via config file to log time as Unix timestamp?
                            
                                Matplotlib: Assign Colors to Lines
                            
                                subprocess.Popen and shlex.split formatting in windows and linux
                            
                                Another Simple Random Walk Simulation Using Python(Two-Dimensional)
                            
                                Python 2.X: Why Can't I Properly Handle Unicode?
                            
                                Double or float - optimization routines
                            
                                How to convert array of tamil unicode values into tamil string in python with whitespaces?
                            
                                Name of this algorithm, and is there a numpy/scipy implementation of it?
                            
                                Convert List of Numbers to String Ranges
                            
                                Which spam corpus I can use in NLTK?
                            
                                is there putAll like method for dict in python?
                            
                                matplotlib plot small image without resampling
                            
                                How do I access my database level functions inside other classes/files in Tornado?
                            
                                Creating cx_Freeze exe with Numpy for Python
                            
                                Tracking down the assumptions made by SciPy's `ttest_ind()` function
                            
                                Is there a django idiom to store app-related variables in the DB?
                            
                                Static folders structure in Django 1.4?
                            
                                python celery max-tasks-per-child-setting default
                            
                                python suds wrong namespace prefix in SOAP request

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With