Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Periodic Data with Machine Learning (Like Degree Angles -> 179 is 2 different from -179)

I'm using Python for kernel density estimations and gaussian mixture models to rank likelihood of samples of multidimensional data. Every piece of data is an angle, and I'm not sure how to handle the periodicity of angular data for machine learning.

First I removed all negative angles by adding 360 to them, so all angles that were negative became positive, -179 becoming 181. I believe this elegantly handles the case of -179 an similar being not significantly different than 179 and similar, but it does not handle instances like 359 being not dissimilar from 1.

One way I've thought of approaching the issue is keeping both negative and negative+360 values and using the minimum of the two, but this would require modification of the machine learning algorithms.

Is there a good preprocessing-only solution to this problem? Anything built into scipy or scikit?

Thanks!

like image 925
calben Avatar asked Dec 04 '13 17:12

calben


People also ask

What is risk in machine learning?

What are the risks of machine learning data? Poor data. Your machine learning model can't grasp the context of the tasks it performs. It relies on human-supplied training data to work. (“Garbage in, garbage out” is often used to describe this issue.)

What is a kernel in machine learning?

In machine learning, a kernel refers to a method that allows us to apply linear classifiers to non-linear problems by mapping non-linear data into a higher-dimensional space without the need to visit or understand that higher-dimensional space.

Why are some algorithms referred to as black box algorithms?

A black box algorithm is one where the user cannot see the inner workings of the algorithm. It is a rather controversial system, due to the secrecy they contain and the lack of transparency, although its creators defend it as a security and privacy system to avoid data leaks and unfair competition.

Which of the following is a regression algorithm?

Some of the popular types of regression algorithms are linear regression, regression trees, lasso regression and multivariate regression.


1 Answers

As Tal Darom wrote in the comments, you can replace every periodic feature x with two features cos(x) and sin(x) after normalizing to radians. That solves the 359 ≈ 1 problem:

>>> def fromdeg(d):
...     r = d * np.pi / 180.
...     return np.array([np.cos(r), np.sin(r)])
... 
>>> np.linalg.norm(fromdeg(1) - fromdeg(359))
0.03490481287456796
>>> np.linalg.norm(fromdeg(1) - fromdeg(180))
1.9999238461283426
>>> np.linalg.norm(fromdeg(90) - fromdeg(270))
2.0

norm(a - b) is the good old Euclidean distance between vectors a and b. As you can verify using a simple plot, or by realizing that these (cos,sin) pairs are really coordinates on the unit circle, that this distance is maximal (and the dot product minimal) between two of these (cos,sin) vectors when the original angles differ by 180°.

like image 184
Fred Foo Avatar answered Oct 01 '22 17:10

Fred Foo