We have multiple very high quality road networks available from multiple sources (Open Street Map, TomTom...). These sources contain way more information than we need, effectively blocking our calculations. Filtering out secondary roads is easy. Our main problems are the representation of highways (two roads in opposite directions), complicated highway crossings (various exit roads, junctions are not points). For our purposes, a more "topological" style road network would be ideal.
Highly detailed data source:
The ideal simplified network:
Are there any algorithms that would help us extract the simplified road network? If there is even implementation available in .NET, that would be a real winner.
UPDATE:
The original data is represented as polylines with some limited metadata attached. The metadata tells the identity of the road (name or number), the 'rank' of the road (highway, primary, secondary, etc.), and some more details like speed limit, whether the line section is a bridge or tunnel. The quality of the data is very good, we can easily group together the polyline segments that together form a road based on the road identity. Similarly, it is very easy to ignore the secondary roads. Acceleration/deceleration lanes at highway exit are also marked clearly in their rank, so they are also easily filterable.
We see two main problems:
1) Highways: Replace the two (or more) oneway roads with a single road
2) Highway junctions: Identify the center point of the junctions, and make sure that our simplified highways are connected to that.
UPDATE 2: Data is stored in EZRI Shape files. Using SharpMap library, they are is relatively easy to parse, or do geospatial search. Source data is segmented by country, one country is one shape file (if the country is too big like US, Germany), it is further divided to smaller regions. And yes, this division brings up a further problem. How to make sure that the simplified highways on the border of France and Germany meet each other?
Thanx for the attention
This is only a sketch of a solution, but:
Define a distance metric between pairs of curves. The first one that comes to mind is the area enclosed by two curves, divided by their lengths. You could augment this with your metadata. The goal is to design a metric that will give a small distance to pairs of roads you consider similar, and a large one to ones you consider dissimilar.
Now pick a clustering algorithm and ask it to cluster the roads based on the distance you just defined. Be very generous with the number of clusters you let it use. When it comes back, look for clusters with very low "diameter", meaning that every point in the cluster is very similar to every other. "Complete linkage clustering" is probably a good place to start your research, as it leads to exactly that kind of cluster.
You can then take the average within each of these clusters to turn collections of very similar roads into a single road, solving your problem (1) (and hopefully (2) too).
That done, the next task is to distinguish "important" roads from "unimportant" roads. The best approach here would be to sit down and build a training set of a couple of hundred random roads, manually labelling them with whether they're important or not. Then take a classifier of some sort and train them on your manually-labelled set. Then ask it to predict which other roads are important.
I can't say which classifier would be best to use, but if you can spare the time to build a large training set and study the literature, "neural networks" can give some impressive results. If you want something simpler, look at "random forests" or (even simpler) "logistic regression".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With