Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use xmltodict to get items out of an xml file

I am trying to easily access values from an xml file.

<artikelen>
    <artikel nummer="121">
        <code>ABC123</code>
        <naam>Highlight pen</naam>
        <voorraad>231</voorraad>
        <prijs>0.56</prijs>
    </artikel>
    <artikel nummer="123">
        <code>PQR678</code>
        <naam>Nietmachine</naam>
        <voorraad>587</voorraad>
        <prijs>9.99</prijs>
    </artikel>
..... etc

If i want to acces the value ABC123, how do I get it?

import xmltodict

with open('8_1.html') as fd:
    doc = xmltodict.parse(fd.read())
    print(doc[fd]['code'])
like image 275
54m Avatar asked Oct 20 '16 12:10

54m


People also ask

What is Xmltodict?

xmltodict is a Python module that makes working with XML feel like you are working with JSON, as in this "spec": >>> print(json. dumps(xmltodict.

How do you parse an XML string in Python?

There are two ways to parse the file using 'ElementTree' module. The first is by using the parse() function and the second is fromstring() function. The parse () function parses XML document which is supplied as a file whereas, fromstring parses XML when supplied as a string i.e within triple quotes.


1 Answers

Using your example:

import xmltodict

with open('artikelen.xml') as fd:
    doc = xmltodict.parse(fd.read())

If you examine doc, you'll see it's an OrderedDict, ordered by tag:

>>> doc
OrderedDict([('artikelen',
              OrderedDict([('artikel',
                            [OrderedDict([('@nummer', '121'),
                                          ('code', 'ABC123'),
                                          ('naam', 'Highlight pen'),
                                          ('voorraad', '231'),
                                          ('prijs', '0.56')]),
                             OrderedDict([('@nummer', '123'),
                                          ('code', 'PQR678'),
                                          ('naam', 'Nietmachine'),
                                          ('voorraad', '587'),
                                          ('prijs', '9.99')])])]))])

The root node is called artikelen, and there a subnode artikel which is a list of OrderedDict objects, so if you want the code for every article, you would do:

codes = []
for artikel in doc['artikelen']['artikel']:
    codes.append(artikel['code'])

# >>> codes
# ['ABC123', 'PQR678']

If you specifically want the code only when nummer is 121, you could do this:

code = None
for artikel in doc['artikelen']['artikel']:
    if artikel['@nummer'] == '121':
        code = artikel['code']
        break

That said, if you're parsing XML documents and want to search for a specific value like that, I would consider using XPath expressions, which are supported by ElementTree.

like image 99
Paul Avatar answered Nov 16 '22 01:11

Paul