Get all text from an XML document?

2 Answers

Using stdlib xml.etree

import xml.etree.ElementTree as ET

tree = ET.parse('sample.xml') 
print(ET.tostring(tree.getroot(), encoding='utf-8', method='text'))

196

answered Oct 04 '22 08:10

schettino72

I really like BeautifulSoup, and would rather not use regex on HTML if we can avoid it.

Adapted from: [this StackOverflow Answer], [BeautifulSoup documentation]

from bs4 import BeautifulSoup
soup = BeautifulSoup(txt)    # txt is simply the a string with your XML file
pageText = soup.findAll(text=True)
print ' '.join(pageText)

Though of course, you can (and should) use BeautifulSoup to navigate the page for what you are looking for.

answered Oct 04 '22 08:10

Prashant Kumar

Related questions
                            
                                Python: Remove Duplicate Items from Nested list
                            
                                Producing pdf report from python with bullet points
                            
                                In Python, efficiently determine if two lists are shifted copies of one another
                            
                                Nested tags in BeautifulSoup - Python
                            
                                Can I use python slicing to access one "column" of a nested tuple?
                            
                                Django models.FileField - store only the file name not any paths or folder references
                            
                                How do you rotate the numbers in an numpy array of shape (n,) or (n,1)?
                            
                                Train scikit svm one by one (online or stochastic training)
                            
                                Want to find a way of doing an average of multiple lists
                            
                                Command output parsing in Python
                            
                                Convert numpy scalar to simple python type [duplicate]
                            
                                Python "'module' object is not callable"
                            
                                How to download a zip file from a site (python) [closed]
                            
                                Django: how to log exceptions from management commands?
                            
                                How do I create a numpy array using a function?
                            
                                iterate python nested lists efficiently
                            
                                os.system vs subprocess in python on linux
                            
                                PyQt 4: Making a label scrollable
                            
                                Jinja has a "center" formatting option, but how about "right align"?
                            
                                Pymongo Not creating collection in mongodb

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Get all text from an XML document?

Tags:

python

xml

lxml

Richard

People also ask

2 Answers

schettino72

Prashant Kumar

Recent Activity

Donate For Us