Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to build a 'roads' graph in a database from an .osm file

Required result:

A database with two tables, 'edges' and 'nodes', which can be used to generate a graph which represents all drivable roads (highways) in a certain area (e.g. city). The dataset is parsed from an .osm file.

Background:

I'm going for my second try of creating a database with two tables - nodes and edges which can be used to do different shortest path calculations on it.

First I tried to manually extract different data into the database (php scripts):

  • parse all the nodes from the .osm file into 'nodes' table
  • parse all the edges into the 'edges' table (that is parse the ways and generate edges (consisting of a starting node and a destination node) from those, adding the tags to the edges simultaneously;
  • delete all the edges that have blacklisted tags (that are not highways) from the edges table;
  • delete all nodes that are not present in the edges table from the nodes table.

The problem is that this sequence is very unreliable and it works 'so-so' only on small datasets, like thousands of nodes, but not millions.

Problem:

With a database consisting of millions of nodes and edges, road-extracting (i.g. generating edges based on a certain tags filter) is extremely slow.

Possible solution:

What I'm going to do this time:

  • set up PostgreSQL rdbms;
  • parse the .osm file with OSM ready-made scripts to get full nodes, ways and relations tables;
  • extract all edges with certain tags (e.g. 'highway') into a new 'edges' table from the ways table;
  • extract all nodes that are present in the edges table into a new graph_nodes table;
  • after that I probably could generate a 'roads' graph from 'edges' and 'graph_nodes' tables.

Question:

How do I correctly parse an .osm file into a database to have 'edges' and 'nodes' tables as a result?

How can I extract only drivable roads from .osm into a database?

How do I do it fast (I mean hours of scripts running for millions of nodes and edges tables, not weeks)?

Should I take relations into account?

If yes, how should I do that?

like image 428
Kirill Avatar asked Nov 15 '22 00:11

Kirill


1 Answers

As I mentioned in my comment, where is really no need to insert the edges that are blacklisted. Also, another problem comes to my mind. Why do you need to delete nodes from the nodes table anyway? You can add a special flag field (let's call it used) in the nodes table.

When you insert a new row to the nodes table, this field is set to false. Also you can implement a trigger for insert in edges table, that for every inserted row the appropriate rows in the nodes table are marked as used.

After you are done with that, you can just delete all the unused nodes. That is, I suppose right now you use something like this:

delete from nodes where id not in (select start_node from edges);
delete from nodes where id not in (select end_node from edges);

And I suggest you do the following:

delete from nodes where used = false;
like image 109
SPIRiT_1984 Avatar answered Dec 28 '22 07:12

SPIRiT_1984