Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I extract hierarchical city/state/country data from OSM XML planet files?

I want to write a script that parses OpenStreetMap (OSM) XML files and builds a database of towns and cities in a hierarchical fashion. I want the resulting data set to have a hierarchy that might look like this in the US:

USA -> California -> San Francisco County -> San Francisco

and maybe like this in the UK:

United Kingdom -> England -> Middlesex -> London -> Soho

The output will be a JSON document that describes a hierarchy for all cities in the OSM file, with a structure like the examples above.

I'm using Python and the "imposm" parser library and I can load and parse the file without a problem; my issue is a lack of understanding of how the OSM data is structured: I don't know how to know the parent/child relationship between nodes in OSM's data. For example, if I locate the node for "Soho", how can I tie it back to the nodes for "City of Westminster", "Greater London", "Middlesex" and "England"?

I know that some nodes have an "is_in" tag that might give some of this information, but

  • A) this is inconsistent and
  • B) it seems to be a free-form text field, not a link to an OSM node (ie. is_in: "City of Westminster" does not give me any link to the Westminster node).

Please let me know if you have any suggestions for how to link these nodes hierarchically.

like image 290
luke Avatar asked Sep 16 '11 10:09

luke


People also ask

How do I get data from OSM?

To extract the data, you just browse to the OpenStreetMap website and use search, pan, zoom tools like any other web map to find an area where you want the data. Then, use the Export tool on the top bar menu and confirm the bounding box region before export the data. That's it!

Is OSM an XML?

osm is a *. xml file which normaly doesn't have to be renamed.

What is OSM file format?

An OSM file is a street map saved in the OpenStreetMap (OSM) format. It contains XML-formatted data in the form of "nodes" (points), "ways" (connections), and "relations" (street and object properties, such as tags). OSM file open in Java OpenStreetMap Editor.


1 Answers

Basically everything is "free-form" in OSM. There are conventions on tagging, but there is no guarantee people will stick to them. So you will need to do some data cleaning and postprocessing to get anything consistent.

As for parent-child relationships, there are no hard-wired relationships in OSM other than:

  • A node is used by one or more ways
  • A node is a member of one or more relations
  • A way is a member of one or more relations
  • A relation is a member of one or more relations

OSM relations can be used to define hierarchical relationships, but the way these are defined is very generic. The semantics is based on conventions (usually described on OSM Wiki pages).

If you're looking for an "is_in" relationship, I think you will need to establish it using geometric methods. You cannot really rely just on OSM tagging for this, unfortunately.

like image 143
Igor Brejc Avatar answered Oct 01 '22 21:10

Igor Brejc