Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get all text from an XML document?

Tags:

python

xml

lxml

How can I get all the text content of an XML document, as a single string - like this Ruby/hpricot example but using Python.

I'd like to replace XML tags with a single whitespace.

like image 738
Richard Avatar asked Jul 08 '13 15:07

Richard


People also ask

Can you parse XML?

You can use the XML parser to transform a string of XML text in UTF-8 encoding into an XML object representation of the string. The XML parser generates an output schema for the XML object, displayed in the flow editor as a tree structure that shows each element with its data type, as well as any attributes..

What is fetch data from XML file?

The page uses the XMLHttpRequest (JavaScript) object to fetch the XML file (sample. xml) then parses it in JavaScript and creates the chart. The function that parses the XML response and then uses the data to create the chart is shown below and called myXMLProcessor() (it's the XMLHttpRequest callback function).


2 Answers

Using stdlib xml.etree

import xml.etree.ElementTree as ET

tree = ET.parse('sample.xml') 
print(ET.tostring(tree.getroot(), encoding='utf-8', method='text'))
like image 196
schettino72 Avatar answered Oct 04 '22 08:10

schettino72


I really like BeautifulSoup, and would rather not use regex on HTML if we can avoid it.

Adapted from: [this StackOverflow Answer], [BeautifulSoup documentation]

from bs4 import BeautifulSoup
soup = BeautifulSoup(txt)    # txt is simply the a string with your XML file
pageText = soup.findAll(text=True)
print ' '.join(pageText)

Though of course, you can (and should) use BeautifulSoup to navigate the page for what you are looking for.

like image 27
Prashant Kumar Avatar answered Oct 04 '22 08:10

Prashant Kumar