Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Xml parsing from web response

I'm trying to get response from nominatim to geo-code few thousands of cities.

import os
import requests
import xml.etree.ElementTree as ET

txt = open('input.txt', 'r').readlines()
for line in txt:
 lp, region, district, municipality, city = line.split('\t')
 baseUrl = 'http://nominatim.openstreetmap.org/search/gb/'+region+'/'+district+'/'+municipality+'/'+city+'/?format=xml' 
 # eg. http://nominatim.openstreetmap.org/search/pl/podkarpackie/stalowowolski/Bojan%C3%B3w/Zapu%C5%9Bcie/?format=xml
 resp = requests.get(baseUrl)
 resp.encoding = 'UTF-8' # special diacritics
 msg = resp.text
 # parse response to get lat & long
 tree = ET.parse(msg)
 root = tree.getroot()
 print tree

but the result is:

Traceback (most recent call last):
File "geo_miasta.py", line 17, in <module>
    tree = ET.parse(msg)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1182, in parse
    tree.parse(source, parser)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 647, in parse
    source = open(source, "rb")    
IOError: [Errno 2] No such file or directory: u'<?xml version="1.0" encoding="UTF-8" ?>\n<searchresults timestamp=\'Tue, 11 Feb 14 21:13:50 +0000\' attribution=\'Data \xa9 OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright\' querystring=\'\u015awierczyna, Drzewica, opoczy\u0144ski, \u0142\xf3dzkie, gb\' polygon=\'false\' more_url=\'http://nominatim.openstreetmap.org/search?format=xml&amp;exclude_place_ids=&amp;q=%C5%9Awierczyna%2C+Drzewica%2C+opoczy%C5%84ski%2C+%C5%82%C3%B3dzkie%2C+gb\'>\n</searchresults>'

What is wrong with this?

Edit: Thant to @rob my solution is:

#! /usr/bin/env python2.7
# -*- coding: utf-8 -*-

import os
import requests
import xml.etree.ElementTree as ET

txt = open('input.txt', 'r').read().split('\n')

for line in txt:
    lp, region, district, municipality, city = line.split('\t')
    baseUrl = 'http://nominatim.openstreetmap.org/search/pl/'+region+'/'+district+'/'+municipality+'/'+city+'/?format=xml'
    resp = requests.get(baseUrl)
    msg = resp.content
    tree = ET.fromstring(msg)
    for place in tree.findall('place'):
    location = '{:5f}\t{:5f}'.format(
        float(place.get('lat')),
        float(place.get('lon')))

    f = open('result.txt', 'a')
    f.write(location+'\t'+region+'\t'+district+'\t'+municipality+'\t'+city)
    f.close()
like image 557
m93 Avatar asked Feb 11 '14 21:02

m93


People also ask

Can browser parse XML?

All major browsers have a built-in XML parser to access and manipulate XML.

How XML Separate data from HTML?

XML Separates Data from HTML When displaying data in HTML, you should not have to edit the HTML file when the data changes. With XML, the data can be stored in separate XML files. With a few lines of JavaScript code, you can read an XML file and update the data content of any HTML page.

What is XML request and response?

XML request and response support consists of two main functions: The XML parsing function parses an inbound XML request message and maps XML elements to a fixed format COMMAREA. See XML message formats for a sample of a request message in XML format.


1 Answers

You are using xml.etree.ElementTree.parse(), which takes a filename or a file object as an argument. But, you are not passing a file or file object in, you are passing a unicode string.

Try xml.etree.ElementTree.fromstring(text).

Like this:

 tree = ET.fromstring(msg)

Here is a complete sample program:

import os
import requests
import xml.etree.ElementTree as ET

baseUrl = 'http://nominatim.openstreetmap.org/search/pl/podkarpackie/stalowowolski/Bojan%C3%B3w/Zapu%C5%9Bcie\n/?format=xml'
resp = requests.get(baseUrl)
msg = resp.content
tree = ET.fromstring(msg)
for place in tree.findall('place'):
  print u'{:s}: {:+.2f}, {:+.2f}'.format(
    place.get('display_name'),
    float(place.get('lon')),
    float(place.get('lat'))).encode('utf-8')
like image 62
Robᵩ Avatar answered Sep 22 '22 10:09

Robᵩ