Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python XML parsing from website

Tags:

python

xml

I am trying to Parse from a website. I am stuck. I will provide the XML below. It is coming from a webiste. I have two questions. What is the best way to read xml from a website, and then I am having trouble digging into the xml to get the rate I need.

The figure I need back is Base:OBS_VALUE 0.12

What I have so far:

from xml.dom import minidom
import urllib


document = ('http://www.newyorkfed.org/markets/omo/dmm/fftoXML.cfm?type=daily''r')
web = urllib.urlopen(document)
get_web = web.read()
xmldoc = minidom.parseString(document)

ff_DataSet = xmldoc.getElementsByTagName('ff:DataSet')[0]

ff_series = ff_DataSet.getElementsByTagName('ff:Series')[0]

for line in ff_series:
    price = line.getElementsByTagName('base:OBS_VALUE')[0].firstChild.data
    print(price)

XML code from webiste:

-<Header> <ID>FFD</ID>
 <Test>false</Test> 
 <Name xml:lang="en">Federal Funds daily averages</Name> <Prepared>2013-05-08</Prepared>
 <Sender id="FRBNY"> <Name xml:lang="en">Federal Reserve Bank of New York</Name> 
<Contact>   
<Name xml:lang="en">Public Information Web Team</Name> <Email>[email protected]</Email>  
</Contact> 
</Sender> 
<!--ReportingBegin></ReportingBegin-->
</Header> 
<ff:DataSet> -<ff:Series TIME_FORMAT="P1D" DISCLAIMER="G" FF_METHOD="D" DECIMALS="2" AVAILABILITY="A"> 
<ffbase:Key> 
<base:FREQ>D</base:FREQ> 
<base:RATE>FF</base:RATE>
<base:MATURITY>O</base:MATURITY> 
<ffbase:FF_SCOPE>D</ffbase:FF_SCOPE> 
</ffbase:Key> 
<ff:Obs OBS_CONF="F" OBS_STATUS="A">
<base:TIME_PERIOD>2013-05-07</base:TIME_PERIOD>
<base:OBS_VALUE>0.12</base:OBS_VALUE>
like image 403
Trying_hard Avatar asked May 08 '13 13:05

Trying_hard


People also ask

Can browser parse XML?

All major browsers have a built-in XML parser to access and manipulate XML.

Which class in C# would you use to connect to an XML file and read it?

This article shows you how to use the XmlTextReader class to read XML from a URL. The streamed information can come from kinds of sources, such as a byte stream from a server, a file, or a TextReader class.

What is parsing in Python?

Parsing is defined as the process of converting codes to machine language to analyze the correct syntax of the code. Python provides a library called a parser.


2 Answers

If you wanted to stick with xml.dom.minidom, try this...

from xml.dom import minidom
import urllib

url_str = 'http://www.newyorkfed.org/markets/omo/dmm/fftoXML.cfm?type=daily'
xml_str = urllib.urlopen(url_str).read()
xmldoc = minidom.parseString(xml_str)

obs_values = xmldoc.getElementsByTagName('base:OBS_VALUE')
# prints the first base:OBS_VALUE it finds
print obs_values[0].firstChild.nodeValue

# prints the second base:OBS_VALUE it finds
print obs_values[1].firstChild.nodeValue

# prints all base:OBS_VALUE in the XML document
for obs_val in obs_values:
    print obs_val.firstChild.nodeValue

However, if you want to use lxml, use underrun's solution. Also, your original code had some errors. You were actually attempting to parse the document variable, which was the web address. You needed to parse the xml returned from the website, which in your example is the get_web variable.

like image 82
b10hazard Avatar answered Oct 01 '22 15:10

b10hazard


Take a look at your code:

document = ('http://www.newyorkfed.org/markets/omo/dmm/fftoXML.cfm?type=daily''r')
web = urllib.urlopen(document)
get_web = web.read()
xmldoc = minidom.parseString(document)

I'm not sure you have document correct unless you want http://www.newyorkfed.org/markets/omo/dmm/fftoXML.cfm?type=dailyr because that's what you'll get (the parens group in this case and strings listed next to each other automatically concatenate).

After that you do some work to create get_web but then you don't use it in the next line. Instead you try to parse your document which is the url...

Beyond that, I would totally suggest you use ElementTree, preferably lxml's ElementTree (http://lxml.de/). Also, lxml's etree parser takes a file-like object which can be a urllib object. If you did, after straightening out the rest of your doc, you could do this:

from lxml import etree
from io import StringIO
import urllib

url = 'http://www.newyorkfed.org/markets/omo/dmm/fftoXML.cfm?type=daily'
root = etree.parse(urllib.urlopen(url))

for obs in root.xpath('/ff:DataSet/ff:Series/ff:Obs'):
    price = obs.xpath('./base:OBS_VALUE').text
    print(price)
like image 26
underrun Avatar answered Oct 01 '22 14:10

underrun