I'm new to python and I'm having a particularly difficult time working with xml and python. The situation I have is this, I'm trying to count the number of times a word appears in an xml document. Simple enough, but the xml document is a response from a server. Is it possible to do this without writing to a file? It would be great trying to do it from memory. Here is a sample xml code: <pre class="prettyprint"><code><xml> <title>Info</title> <foo>aldfj</foo> <data>Text I want to count</data> </xml> </code></pre> Here is what I have in python <pre class="prettyprint"><code>import urllib2 import StringIO import xml.dom.minidom from xml.etree.ElementTree import parse usock = urllib.urlopen('http://www.example.com/file.xml') xmldoc = minidom.parse(usock) print xmldoc.toxml() </code></pre> Past This point I have tried using StringIO, ElementTree, and minidom to no success and I have gotten to a point where I'm not sure what else to do. Any help would be greatly appreciated

It's quite simple, as far as I can tell: <pre class="prettyprint"><code>import urllib2 from xml.dom import minidom usock = urllib2.urlopen('http://www.example.com/file.xml') xmldoc = minidom.parse(usock) for element in xmldoc.getElementsByTagName('data'): print element.firstChild.nodeValue </code></pre> So to count the occurrences of a string, try this (a bit condensed, but I like one-liners): <pre class="prettyprint"><code>count = sum(element.firstChild.nodeValue.find('substring') for element in xmldoc.getElementsByTagName('data')) </code></pre>

If you are just trying to count the number of times a word appears in an XML document, just read the document as a string and do a count: <pre class="prettyprint"><code>import urllib2 data = urllib2.urlopen('http://www.example.com/file.xml').read() print data.count('foobar') </code></pre> Otherwise, you can just iterate through the tags you are looking for: <pre class="prettyprint"><code>from xml.etree import cElementTree as ET xml = ET.fromstring(urllib2.urlopen('http://www.example.com/file.xml').read()) for data in xml.getiterator('data'): # do something with data.text </code></pre>

In Python - Parsing a response xml and finding a specific text vaule

Tags:

python

memory

parsing

xml

I'm new to python and I'm having a particularly difficult time working with xml and python. The situation I have is this, I'm trying to count the number of times a word appears in an xml document. Simple enough, but the xml document is a response from a server. Is it possible to do this without writing to a file? It would be great trying to do it from memory.

Here is a sample xml code:

<xml>
  <title>Info</title>
    <foo>aldfj</foo>
      <data>Text I want to count</data>
</xml>

Here is what I have in python

import urllib2
import StringIO
import xml.dom.minidom
from xml.etree.ElementTree import parse
usock = urllib.urlopen('http://www.example.com/file.xml') 
xmldoc = minidom.parse(usock)
print xmldoc.toxml()

Past This point I have tried using StringIO, ElementTree, and minidom to no success and I have gotten to a point where I'm not sure what else to do.

Any help would be greatly appreciated

407

asked Oct 05 '11 21:10

Jason

2 Answers

It's quite simple, as far as I can tell:

import urllib2
from xml.dom import minidom

usock = urllib2.urlopen('http://www.example.com/file.xml') 
xmldoc = minidom.parse(usock)

for element in xmldoc.getElementsByTagName('data'):
  print element.firstChild.nodeValue

So to count the occurrences of a string, try this (a bit condensed, but I like one-liners):

count = sum(element.firstChild.nodeValue.find('substring') for element in xmldoc.getElementsByTagName('data'))

answered Oct 12 '22 15:10

Blender

If you are just trying to count the number of times a word appears in an XML document, just read the document as a string and do a count:

import urllib2
data = urllib2.urlopen('http://www.example.com/file.xml').read()
print data.count('foobar')

Otherwise, you can just iterate through the tags you are looking for:

from xml.etree import cElementTree as ET
xml = ET.fromstring(urllib2.urlopen('http://www.example.com/file.xml').read())
for data in xml.getiterator('data'):
    # do something with
    data.text

answered Oct 12 '22 15:10

Derek Springer

Related questions
                            
                                Is web2py suitable for a large public website? [closed]
                            
                                How do I generate (and label) a random integer with python 3.2?
                            
                                Removing multiple MongoDB documents in Python
                            
                                Wave Simulation with Python
                            
                                Import a script in IDLE
                            
                                How do I save a mode 'F' image? (Python/PIL)
                            
                                How to detect motion between two PIL images? (wxPython webcam integration example included)
                            
                                How to read text from a Tkinter Text Widget
                            
                                What's a good Python library to manipulate frames of a video file?
                            
                                How can I combine multiple lines of text into one line in Python with a delimiter to separate them?
                            
                                gdata-python-api + Analytics with simple auth
                            
                                Python memory serialisation
                            
                                Storing user and password in a database
                            
                                python urllib2: connection reset by peer
                            
                                Reading unicode elements into numpy array
                            
                                Pycharm warns about Unexpected type in a SqlAlchemy model
                            
                                setuptools: data files included with `bdist` but not with `sdist`
                            
                                Programmatically tell if a Unicode character takes up more than one character space in a terminal
                            
                                Why str can't get a second parameter,when __str__ can?
                            
                                Python Modules: When one imports them, do they go into memory?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With