I am trying to Parse from a website. I am stuck. I will provide the XML below. It is coming from a webiste. I have two questions. What is the best way to read xml from a website, and then I am having trouble digging into the xml to get the rate I need. The figure I need back is Base:OBS_VALUE 0.12 What I have so far: <pre class="prettyprint"><code>from xml.dom import minidom import urllib document = ('http://www.newyorkfed.org/markets/omo/dmm/fftoXML.cfm?type=daily''r') web = urllib.urlopen(document) get_web = web.read() xmldoc = minidom.parseString(document) ff_DataSet = xmldoc.getElementsByTagName('ff:DataSet')[0] ff_series = ff_DataSet.getElementsByTagName('ff:Series')[0] for line in ff_series: price = line.getElementsByTagName('base:OBS_VALUE')[0].firstChild.data print(price) </code></pre> XML code from webiste: <pre class="prettyprint"><code>-<Header> <ID>FFD</ID> <Test>false</Test> <Name xml:lang="en">Federal Funds daily averages</Name> <Prepared>2013-05-08</Prepared> <Sender id="FRBNY"> <Name xml:lang="en">Federal Reserve Bank of New York</Name> <Contact> <Name xml:lang="en">Public Information Web Team</Name> <Email>ny.piwebteam@ny.frb.org</Email> </Contact> </Sender>  </Header> <ff:DataSet> -<ff:Series TIME_FORMAT="P1D" DISCLAIMER="G" FF_METHOD="D" DECIMALS="2" AVAILABILITY="A"> <ffbase:Key> <base:FREQ>D</base:FREQ> <base:RATE>FF</base:RATE> <base:MATURITY>O</base:MATURITY> <ffbase:FF_SCOPE>D</ffbase:FF_SCOPE> </ffbase:Key> <ff:Obs OBS_CONF="F" OBS_STATUS="A"> <base:TIME_PERIOD>2013-05-07</base:TIME_PERIOD> <base:OBS_VALUE>0.12</base:OBS_VALUE> </code></pre>

Take a look at your code: <pre class="prettyprint"><code>document = ('http://www.newyorkfed.org/markets/omo/dmm/fftoXML.cfm?type=daily''r') web = urllib.urlopen(document) get_web = web.read() xmldoc = minidom.parseString(document) </code></pre> I'm not sure you have document correct unless you want <code>http://www.newyorkfed.org/markets/omo/dmm/fftoXML.cfm?type=dailyr</code> because that's what you'll get (the parens group in this case and strings listed next to each other automatically concatenate). After that you do some work to create get_web but then you don't use it in the next line. Instead you try to parse your <code>document</code> which is the url... Beyond that, I would totally suggest you use ElementTree, preferably lxml's ElementTree (http://lxml.de/). Also, lxml's etree parser takes a file-like object which can be a urllib object. If you did, after straightening out the rest of your doc, you could do this: <pre class="prettyprint"><code>from lxml import etree from io import StringIO import urllib url = 'http://www.newyorkfed.org/markets/omo/dmm/fftoXML.cfm?type=daily' root = etree.parse(urllib.urlopen(url)) for obs in root.xpath('/ff:DataSet/ff:Series/ff:Obs'): price = obs.xpath('./base:OBS_VALUE').text print(price) </code></pre>

Python XML parsing from website

Tags:

python

xml

I am trying to Parse from a website. I am stuck. I will provide the XML below. It is coming from a webiste. I have two questions. What is the best way to read xml from a website, and then I am having trouble digging into the xml to get the rate I need.

The figure I need back is Base:OBS_VALUE 0.12

What I have so far:

Click to copy

from xml.dom import minidom
import urllib


document = ('http://www.newyorkfed.org/markets/omo/dmm/fftoXML.cfm?type=daily''r')
web = urllib.urlopen(document)
get_web = web.read()
xmldoc = minidom.parseString(document)

ff_DataSet = xmldoc.getElementsByTagName('ff:DataSet')[0]

ff_series = ff_DataSet.getElementsByTagName('ff:Series')[0]

for line in ff_series:
    price = line.getElementsByTagName('base:OBS_VALUE')[0].firstChild.data
    print(price)

XML code from webiste:

Click to copy

-<Header> <ID>FFD</ID>
 <Test>false</Test> 
 <Name xml:lang="en">Federal Funds daily averages</Name> <Prepared>2013-05-08</Prepared>
 <Sender id="FRBNY"> <Name xml:lang="en">Federal Reserve Bank of New York</Name> 
<Contact>   
<Name xml:lang="en">Public Information Web Team</Name> <Email>ny.piwebteam@ny.frb.org</Email>  
</Contact> 
</Sender> 
<!--ReportingBegin></ReportingBegin-->
</Header> 
<ff:DataSet> -<ff:Series TIME_FORMAT="P1D" DISCLAIMER="G" FF_METHOD="D" DECIMALS="2" AVAILABILITY="A"> 
<ffbase:Key> 
<base:FREQ>D</base:FREQ> 
<base:RATE>FF</base:RATE>
<base:MATURITY>O</base:MATURITY> 
<ffbase:FF_SCOPE>D</ffbase:FF_SCOPE> 
</ffbase:Key> 
<ff:Obs OBS_CONF="F" OBS_STATUS="A">
<base:TIME_PERIOD>2013-05-07</base:TIME_PERIOD>
<base:OBS_VALUE>0.12</base:OBS_VALUE>

403

asked May 08 '13 13:05

Trying_hard

2 Answers

If you wanted to stick with xml.dom.minidom, try this...

Click to copy

from xml.dom import minidom
import urllib

url_str = 'http://www.newyorkfed.org/markets/omo/dmm/fftoXML.cfm?type=daily'
xml_str = urllib.urlopen(url_str).read()
xmldoc = minidom.parseString(xml_str)

obs_values = xmldoc.getElementsByTagName('base:OBS_VALUE')
# prints the first base:OBS_VALUE it finds
print obs_values[0].firstChild.nodeValue

# prints the second base:OBS_VALUE it finds
print obs_values[1].firstChild.nodeValue

# prints all base:OBS_VALUE in the XML document
for obs_val in obs_values:
    print obs_val.firstChild.nodeValue

However, if you want to use lxml, use underrun's solution. Also, your original code had some errors. You were actually attempting to parse the document variable, which was the web address. You needed to parse the xml returned from the website, which in your example is the get_web variable.

answered Oct 01 '22 15:10

b10hazard

Take a look at your code:

Click to copy

document = ('http://www.newyorkfed.org/markets/omo/dmm/fftoXML.cfm?type=daily''r')
web = urllib.urlopen(document)
get_web = web.read()
xmldoc = minidom.parseString(document)

I'm not sure you have document correct unless you want http://www.newyorkfed.org/markets/omo/dmm/fftoXML.cfm?type=dailyr because that's what you'll get (the parens group in this case and strings listed next to each other automatically concatenate).

After that you do some work to create get_web but then you don't use it in the next line. Instead you try to parse your document which is the url...

Beyond that, I would totally suggest you use ElementTree, preferably lxml's ElementTree (http://lxml.de/). Also, lxml's etree parser takes a file-like object which can be a urllib object. If you did, after straightening out the rest of your doc, you could do this:

Click to copy

from lxml import etree
from io import StringIO
import urllib

url = 'http://www.newyorkfed.org/markets/omo/dmm/fftoXML.cfm?type=daily'
root = etree.parse(urllib.urlopen(url))

for obs in root.xpath('/ff:DataSet/ff:Series/ff:Obs'):
    price = obs.xpath('./base:OBS_VALUE').text
    print(price)

answered Oct 01 '22 14:10

underrun

Related questions
                            
                                Capturing http status codes with scrapy spider
                            
                                Is there a good way to produce documentation for swig interfaces?
                            
                                How to implement authentication for REST API?
                            
                                Python SOAP client, WSDL call with suds gives Transport Error 401 Unauthorized for HTTP basic authentication
                            
                                Converting data to missing in pandas
                            
                                How can I sort a list of dictionaries by a value in the dictionary? [duplicate]
                            
                                Accessing files in python egg from inside the egg
                            
                                subprocess.Popen execve() arg 3 contains a non-string value
                            
                                how to trigger a python script in outlook using rules?
                            
                                Convert EMF/WMF files to PNG/JPG
                            
                                Deep version of sys.getsizeof [duplicate]
                            
                                How could I arrange multiple pyplot figures in a kind of layout?
                            
                                Paramiko / ssh / tail + grep hangs
                            
                                Digitizing an analog signal
                            
                                tracking progress of a celery.group task?
                            
                                Running Blender python script outside of blender
                            
                                Embedded python: multiprocessing not working
                            
                                Fit points to a plane algorithms, how to iterpret results?
                            
                                Tie breaking of round with numpy
                            
                                How to convert a html table into pandas dataframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python XML parsing from website

Tags:

python

xml

Trying_hard

People also ask

2 Answers

b10hazard

underrun

Recent Activity

Donate For Us