Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Python - Parsing a response xml and finding a specific text vaule

I'm new to python and I'm having a particularly difficult time working with xml and python. The situation I have is this, I'm trying to count the number of times a word appears in an xml document. Simple enough, but the xml document is a response from a server. Is it possible to do this without writing to a file? It would be great trying to do it from memory.

Here is a sample xml code:

<xml>
  <title>Info</title>
    <foo>aldfj</foo>
      <data>Text I want to count</data>
</xml>

Here is what I have in python

import urllib2
import StringIO
import xml.dom.minidom
from xml.etree.ElementTree import parse
usock = urllib.urlopen('http://www.example.com/file.xml') 
xmldoc = minidom.parse(usock)
print xmldoc.toxml()

Past This point I have tried using StringIO, ElementTree, and minidom to no success and I have gotten to a point where I'm not sure what else to do.

Any help would be greatly appreciated

like image 407
Jason Avatar asked Oct 05 '11 21:10

Jason


People also ask

How do you read and parse XML in Python?

To read an XML file using ElementTree, firstly, we import the ElementTree class found inside xml library, under the name ET (common convension). Then passed the filename of the xml file to the ElementTree. parse() method, to enable parsing of our xml file. Then got the root (parent tag) of our xml file using getroot().

How do you access XML elements in Python?

Example Read XML File in Python To read an XML file, firstly, we import the ElementTree class found inside the XML library. Then, we will pass the filename of the XML file to the ElementTree. parse() method, to start parsing. Then, we will get the parent tag of the XML file using getroot() .


2 Answers

It's quite simple, as far as I can tell:

import urllib2
from xml.dom import minidom

usock = urllib2.urlopen('http://www.example.com/file.xml') 
xmldoc = minidom.parse(usock)

for element in xmldoc.getElementsByTagName('data'):
  print element.firstChild.nodeValue

So to count the occurrences of a string, try this (a bit condensed, but I like one-liners):

count = sum(element.firstChild.nodeValue.find('substring') for element in xmldoc.getElementsByTagName('data'))
like image 86
Blender Avatar answered Oct 12 '22 15:10

Blender


If you are just trying to count the number of times a word appears in an XML document, just read the document as a string and do a count:

import urllib2
data = urllib2.urlopen('http://www.example.com/file.xml').read()
print data.count('foobar')

Otherwise, you can just iterate through the tags you are looking for:

from xml.etree import cElementTree as ET
xml = ET.fromstring(urllib2.urlopen('http://www.example.com/file.xml').read())
for data in xml.getiterator('data'):
    # do something with
    data.text
like image 36
Derek Springer Avatar answered Oct 12 '22 15:10

Derek Springer