I want to write a script that parses OpenStreetMap (OSM) XML files and builds a database of towns and cities in a hierarchical fashion. I want the resulting data set to have a hierarchy that might look like this in the US:
USA -> California -> San Francisco County -> San Francisco
and maybe like this in the UK:
United Kingdom -> England -> Middlesex -> London -> Soho
The output will be a JSON document that describes a hierarchy for all cities in the OSM file, with a structure like the examples above.
I'm using Python and the "imposm" parser library and I can load and parse the file without a problem; my issue is a lack of understanding of how the OSM data is structured: I don't know how to know the parent/child relationship between nodes in OSM's data. For example, if I locate the node for "Soho", how can I tie it back to the nodes for "City of Westminster", "Greater London", "Middlesex" and "England"?
I know that some nodes have an "is_in" tag that might give some of this information, but
Please let me know if you have any suggestions for how to link these nodes hierarchically.
To extract the data, you just browse to the OpenStreetMap website and use search, pan, zoom tools like any other web map to find an area where you want the data. Then, use the Export tool on the top bar menu and confirm the bounding box region before export the data. That's it!
osm is a *. xml file which normaly doesn't have to be renamed.
An OSM file is a street map saved in the OpenStreetMap (OSM) format. It contains XML-formatted data in the form of "nodes" (points), "ways" (connections), and "relations" (street and object properties, such as tags). OSM file open in Java OpenStreetMap Editor.
Basically everything is "free-form" in OSM. There are conventions on tagging, but there is no guarantee people will stick to them. So you will need to do some data cleaning and postprocessing to get anything consistent.
As for parent-child relationships, there are no hard-wired relationships in OSM other than:
OSM relations can be used to define hierarchical relationships, but the way these are defined is very generic. The semantics is based on conventions (usually described on OSM Wiki pages).
If you're looking for an "is_in" relationship, I think you will need to establish it using geometric methods. You cannot really rely just on OSM tagging for this, unfortunately.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With