Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parse an XML string in Python

Tags:

python

xml

I have this XML string result and i need to get the values in between the tags. But the data type of the XML is string.

  final = "  <Table><Claimable>false</Claimable><MinorRev>80601</MinorRev><Operation>530600 ION MILL</Operation><HTNum>162</HTNum><WaferEC>80318</WaferEC><HolderType>HACARR</HolderType><Job>167187008</Job></Table>

    <Table><Claimable>false</Claimable><MinorRev>71115</MinorRev><Operation>530600 ION MILL</Operation><Experiment>6794</Experiment><HTNum>162</HTNum><WaferEC>71105</WaferEC><HolderType>HACARR</HolderType><Job>16799006</Job></Table> "

This is my code sample

root = ET.fromstring(final)
print root

And this is the error i am receiving :

xml.parsers.expat.ExpatError: The markup in the document following the root element must be well-formed.

Ive tried using ET.fromstring. But with no luck.

like image 275
ellaRT Avatar asked Sep 10 '15 06:09

ellaRT


People also ask

How do you access XML elements in Python?

To read an XML file using ElementTree, firstly, we import the ElementTree class found inside xml library, under the name ET (common convension). Then passed the filename of the xml file to the ElementTree. parse() method, to enable parsing of our xml file. Then got the root (parent tag) of our xml file using getroot().

What is XML write a program of XML parsing in Python?

This article focuses on how one can parse a given XML file and extract some useful data out of it in a structured way. XML: XML stands for eXtensible Markup Language. It was designed to store and transport data. It was designed to be both human- and machine-readable.

Does XML work with Python?

The Python standard library provides a minimal but useful set of interfaces to work with XML. The two most basic and broadly used APIs to XML data are the SAX and DOM interfaces. Simple API for XML (SAX) − Here, you register callbacks for events of interest and then let the parser proceed through the document.

What is XML Etree ElementTree in Python?

The xml.etree.ElementTree module implements a simple and efficient API for parsing and creating XML data. Changed in version 3.3: This module will use a fast implementation whenever available.


2 Answers

Your XML is malformed. It has to have exactly one top level element. From Wikipedia:

Each XML document has exactly one single root element. It encloses all the other elements and is therefore the sole parent element to all the other elements. ROOT elements are also called PARENT elements.

Try to enclose it within additional tag (e.g. Tables) and than parse with ET:

xmlData = '''<Tables>
<Table><Claimable>false</Claimable><MinorRev>80601</MinorRev><Operation>530600 ION MILL</Operation><HTNum>162</HTNum><WaferEC>80318</WaferEC><HolderType>HACARR</HolderType><Job>167187008</Job></Table>
<Table><Claimable>false</Claimable><MinorRev>71115</MinorRev><Operation>530600 ION MILL</Operation><Experiment>6794</Experiment><HTNum>162</HTNum><WaferEC>71105</WaferEC><HolderType>HACARR</HolderType><Job>16799006</Job></Table>
</Tables>
'''

import xml.etree.ElementTree as ET
xml = ET.fromstring(xmlData)

for table in xml.getiterator('Table'):
    for child in table:
        print child.tag, child.text

Since Python 2.7 getiterator('Table') should be replaced with iter('Table'):

for table in xml.iter('Table'):
    for child in table:
        print child.tag, child.text

This produces:

Claimable false
MinorRev 80601
Operation 530600 ION MILL
HTNum 162
WaferEC 80318
HolderType HACARR
Job 167187008
Claimable false
MinorRev 71115
Operation 530600 ION MILL
Experiment 6794
HTNum 162
WaferEC 71105
HolderType HACARR
Job 16799006
like image 151
Maciej Lach Avatar answered Oct 25 '22 12:10

Maciej Lach


Maybe you tried node.attrib, try node.text instead to get the string value (also see Parsing XML in the Python docs):

import xml.etree.ElementTree as ET
xml_string = "<Table><Claimable>false</Claimable><MinorRev>80601</MinorRev><Operation>530600 ION MILL</Operation><HTNum>162</HTNum><WaferEC>80318</WaferEC><HolderType>HACARR</HolderType><Job>167187008</Job></Table>"

root = ET.fromstring(xml_string)

for child in root:
    print child.tag, child.text

This should give you the

Claimable false
MinorRev 80601
Operation 530600 ION MILL
HTNum 162
WaferEC 80318
HolderType HACARR
Job 167187008
like image 39
adrianus Avatar answered Oct 25 '22 12:10

adrianus