I'm new to python and I'm having a particularly difficult time working with xml and python. The situation I have is this, I'm trying to count the number of times a word appears in an xml document. Simple enough, but the xml document is a response from a server. Is it possible to do this without writing to a file? It would be great trying to do it from memory.
Here is a sample xml code:
<xml>
<title>Info</title>
<foo>aldfj</foo>
<data>Text I want to count</data>
</xml>
Here is what I have in python
import urllib2
import StringIO
import xml.dom.minidom
from xml.etree.ElementTree import parse
usock = urllib.urlopen('http://www.example.com/file.xml')
xmldoc = minidom.parse(usock)
print xmldoc.toxml()
Past This point I have tried using StringIO, ElementTree, and minidom to no success and I have gotten to a point where I'm not sure what else to do.
Any help would be greatly appreciated
To read an XML file using ElementTree, firstly, we import the ElementTree class found inside xml library, under the name ET (common convension). Then passed the filename of the xml file to the ElementTree. parse() method, to enable parsing of our xml file. Then got the root (parent tag) of our xml file using getroot().
Example Read XML File in Python To read an XML file, firstly, we import the ElementTree class found inside the XML library. Then, we will pass the filename of the XML file to the ElementTree. parse() method, to start parsing. Then, we will get the parent tag of the XML file using getroot() .
It's quite simple, as far as I can tell:
import urllib2
from xml.dom import minidom
usock = urllib2.urlopen('http://www.example.com/file.xml')
xmldoc = minidom.parse(usock)
for element in xmldoc.getElementsByTagName('data'):
print element.firstChild.nodeValue
So to count the occurrences of a string, try this (a bit condensed, but I like one-liners):
count = sum(element.firstChild.nodeValue.find('substring') for element in xmldoc.getElementsByTagName('data'))
If you are just trying to count the number of times a word appears in an XML document, just read the document as a string and do a count:
import urllib2
data = urllib2.urlopen('http://www.example.com/file.xml').read()
print data.count('foobar')
Otherwise, you can just iterate through the tags you are looking for:
from xml.etree import cElementTree as ET
xml = ET.fromstring(urllib2.urlopen('http://www.example.com/file.xml').read())
for data in xml.getiterator('data'):
# do something with
data.text
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With