Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What machine learning algorithm is appropriate for predicting one time-series from another?

You are a plane tracking an enemy ship that travels across the ocean, so you have collected a series of (x,y,time) coordinates of the ship. You know that a hidden submarine travels with the ship to protect it, but while there is a correlation between their positions, the submarine often wanders off from the ship, so while it's often near it, it can also be on the other side of the world occasionally. You want to predict the path of the submarine, but unfortunately it is hidden from you.

But one month in April you notice the submarine forgets to hide itself, so you have a series of coordinates for both the submarine and the ship throughout 1,000 trips. Using this data, you'd like to build a model to predict the hidden submarine's path given just the ship's movements. The naive baseline would be to say "submarine position guess = "ship's current position" but from the April data where the submarine was visible, you notice there is a tendency for the submarine to be ahead of the ship a bit, so "submarine position guess = ship's position in 1 minute" is an even better estimate. Furthermore, the April data shows that when the ship pauses in the water for an extended period, the submarine is likely to be far away patrolling the coastal waters. There are other patterns of course.

How would you build this model, given the April data as training data, to predict the submarine's path? My current solution is an ad-hoc linear regression where the factors are "trip time", "cargo ship's x coordinate", "was cargo ship idle for 1 day", etc. and then having R figure out the weights and doing a cross-validation. But I would really love a way to generate these factors automatically from the April data. Also, a model that uses sequence or time would be nice, since the linear regression doesn't and I think it's relevant.

Edit: I've reformulated the problem with a made-up story so it's a less confusing. The original problem I posted is:

I have eye-tracking data on two subjects -- a teacher, and a student. It's in the form (x, y, time), so there is a series of these for each subject. What the teacher looks at influences what the student looks at. What method would I use to predict what the student is looking at, using only teacher data? Lets say I can train some learning algorithm using a gold standard set of student and teacher data.

I was thinking hidden markov model would be appropriate, given the definition in Wikipedia, but I am not sure how to put this into practice over my dataset.

More detail: I have data about how a teacher and student each look at a map and some readings. I have 40 of these datasets, which look like [(366,234,0), (386,234,5), ...] which means the teacher looked at point (366,234) at time 0 and then 5 seconds later moved up to look at coordinate (386, 234). I can to learn a model to understand the relationship between how a teacher looks at content, to predict how a student will look at the same content. So maybe the student looks at the content in the same order as the teacher but slower. Or perhaps the student doesn't look around as much but the teacher scans more of the content. I have both sets of data and want to see how accurate of a model I can get -- would I be able to predict the student's looking behavior within 50px of the teacher's looking behavior?

like image 410
user2077851 Avatar asked Feb 16 '13 07:02

user2077851


1 Answers

I'd suggest looking at Kalman Filters, or, more generally, state-space models (SSMs), which are defined by the book recommended below as "just like an HMM, except the hidden states are continuous".

I can recommend a book chapter on the topic - chapter 18 in Kevin P. Murphy's "Machine Learning: a Probabilistic Approach"; there are also online resources (lookup Kalman filters), but I can't recommend any specific one.

EDIT: you can find here references for using Kalman filters with R to predict time-series.

Hope this helps,

like image 103
etov Avatar answered Sep 18 '22 12:09

etov