line (travel path) clustering machine learning algorithm [closed]

Tags:

I have series of line data (2-3 connected points). What is the best machine learning algorithm that I can use to be able to classify lines to their location similarities? (image below)

Preferably python libraries such as SciKit-Learn.

CLICK HERE TO SEE THE IMAGE

Edit: I have tried DBSCAN, but the problem I faced was if there are two lines intersect each other, sometimes DBSCAN consider them to one group even though they are completely in different direction.

Here is a solution I found so far:

GeoPath Clustering Algorithm

The idea here is to cluster geo paths that travel very similar to each other into groups.

Steps:

1- Cluster lines based on slope

2- Within each cluster from step 1, find centriod of lines and by using k-mean algorithm cluster them into smaller groups

3- Within each geoup from step 2, calculate lenght of each line and group lines within defined length threshold

Result will be small groups of lines that have similar slope, close to each other and with similar travel distance.

Here are screen shots of visualization: Yellow lines are all lines and red are cluster of paths travel together. enter image description here

enter image description here

605

asked Jul 29 '16 21:07

user2146024

1 Answers

I'll throw an answer since I think the current one is incomplete...and I also think the comment of "simple heuristic" is premature. I think that if you cluster on points, you'll get a different result than what your diagram depicts. As the clusters will be near the end-points and you wouldn't get your nice ellipses.

So, if your data really does behave similarly to how you display it. I would take a stab at turning each set of 2/3 points into a longer list of points that basically trace out the lines. (you will need to experiment on how dense)

Then run HDBSCAN on the result see video ( https://www.youtube.com/watch?v=AgPQ76RIi6A ) to get your clusters. I believe "pip install hdbscan" installs it.

Now, when testing a new sample, first decompose it into many(N) points and fit them with your hdbscan model. I reckon that if you take a majority voting approach with your N points, you'll get the best overall cluster to which the "line" belongs.

So, while I sort of agree with the "simple heuristic" comment, it's not so simple if you want the whole thing automated. And once you watch the video you may be convinced that HDBSCAN, because of its density-based algorithm, will suit this problem(if you decide to create many points from each sample).

I'll wrap up by saying that I'm sure there are line-intersection models that have done this before...and that there does exist heuristics and rules that can do the job. Likely, they're computationally more economical too. My answer is just something organic using sklearn as you requested...and I haven't even tested it! It's just how I would proceed if I were in your shoes.

edit

I poked around and there a couple of line similarity measures you can possibly try. Frechet and Hausdorff distance measures.

Frechet: http://arxiv.org/pdf/1307.6628.pdf Hausdorff: distance matrix of curves in python for a python example.

If you generate all pair-wise similarities and then group them according to similarity and/or into N bins, you can then call those bins your "clusters" (not kmeans clusters though!). For each new line, generate all similarities and see which bin it belongs to. I revise my original comment of possibly being computationally less intensive...you're lucky your lines only have 2 or 3 points!

143

answered Oct 20 '22 03:10

user1269942

Related questions
                            
                                Combing 2D list of tuples and then sorting them in Python
                            
                                In Tensorflow, how to unravel the flattened indices obtained by tf.nn.max_pool_with_argmax?
                            
                                Watching generation lists during a program run
                            
                                python libclang bindings on Windows fail to initialize a translation unit from sublime text
                            
                                How to extract data from SQL query and assign it to Odoo class columns?
                            
                                How to identify non-printable KeyPress events in Tkinter
                            
                                How to efficiently get the correlation matrix (with p-values) of a data frame with NaN values?
                            
                                How to quickly calculate cosine similarity for large number of vectors in Python?
                            
                                how to vectorise Pandas calculation that is based on last x rows of data
                            
                                Matplotlib Line3DCollection multicolored line edges are "jagged"
                            
                                How to Set spark.sql.parquet.output.committer.class in pyspark
                            
                                flake8 not honoring global configuration. elpy
                            
                                Pandas DatetimeIndex from MongoDB ISODate
                            
                                Pyinstaller- python exe stopped working: "Cannot open self"
                            
                                Why hash function on two different objects return same value?
                            
                                Connect to DynamoDB Local from inside docker container with boto3
                            
                                django float or decimal are rounded unintentionally when saving
                            
                                semantic segmentation with tensorflow - ValueError in loss function (sparse-softmax)
                            
                                Python: Not all environment variables present in os.environ
                            
                                How to install graph-tool for Anaconda Python 3.5 on linux-64?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

line (travel path) clustering machine learning algorithm [closed]

Tags:

python

machine-learning

classification

line

scikit-learn

user2146024

People also ask

1 Answers

user1269942

Recent Activity

Donate For Us