Given an array of 0 and 1, e.g. array[] = {0, 1, 0, 0, 0, 1, ...}
, how I can predict what the next value will be with the best possible accuracy?
What kind of methods are best suited for this kind of task?
The prediction method would depend on the interpretation of data.
However, it looks like in this particular case we can make some general assumptions that might justify use of certain machine learning techniques.
This is a pretty common scenario in many machine learning contexts. One example is the prediction of stock prices based on history.
Now, to build the predictive model you'll need to define the training data set. Assume our model looks at the last k
values. In case if k=1
, we might end up with something similar to a Markov chain model.
Our training data set will consist of k-dimensional data points together with their respective dependent values. For example, suppose k=3
and we have the following input data
0,0,1,1,0,1,0,1,1,1,1,0,1,0,0,1...
We'll have the following training data:
(0,0,1) -> 1
(0,1,1) -> 0
(1,1,0) -> 1
(1,0,1) -> 0
(0,1,0) -> 1
(1,0,1) -> 1
(0,1,1) -> 1
(1,1,1) -> 1
(1,1,1) -> 0
(1,1,0) -> 1
(1,0,1) -> 0
(0,1,0) -> 0
(1,0,0) -> 1
Now, let's say you want to predict the next value in the sequence. The last 3 values are 0,0,1, so the model must predict the value of the function at (0,0,1), based on the training data.
A popular and relatively simple approach would be to use a multivariate linear regression on a k-dimensional data space. Alternatively, consider using a neural network if linear regression underfits the training data set.
You might need to try out different values of k
and test against your validation set.
You could use a maximum likelihood estimator for the Bernoulli distribution. In essence you would:
In Python this could look like this:
#!/usr/bin/env python
from __future__ import division
signal = [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0]
def maximum_likelihood(s, last=None):
"""
The maximum likelihood estimator selects the parameter value which gives
the observed data the largest possible probability.
http://mathworld.wolfram.com/MaximumLikelihood.html
If `last` is given, only use the last `n` values.
"""
if not last:
return sum(s) / len(s)
return sum(s[:-last]) / last
if __name__ == '__main__':
hits = []
print('p\tpredicted\tcorrect\tsignal')
print('-\t---------\t-------\t------')
for i in range(1, len(signal) - 1):
p = maximum_likelihood(signal[:i]) # p = maximum_likelihood(signal[:i], last=2)
prediction = int(p >= 0.5)
hits.append(prediction == signal[i])
print('%0.3f\t%s\t\t%s\t%s' % (
p, prediction, prediction == signal[i], signal[:i]))
print('accuracy: %0.3f' % (sum(hits) / len(hits)))
The output would like this:
# p predicted correct signal
# - --------- ------- ------
# 1.000 1 False [1]
# 0.500 1 True [1, 0]
# 0.667 1 True [1, 0, 1]
# 0.750 1 False [1, 0, 1, 1]
# 0.600 1 False [1, 0, 1, 1, 0]
# 0.500 1 True [1, 0, 1, 1, 0, 0]
# 0.571 1 False [1, 0, 1, 1, 0, 0, 1]
# 0.500 1 True [1, 0, 1, 1, 0, 0, 1, 0]
# 0.556 1 True [1, 0, 1, 1, 0, 0, 1, 0, 1]
# 0.600 1 False [1, 0, 1, 1, 0, 0, 1, 0, 1, 1]
# 0.545 1 True [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0]
# 0.583 1 True [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1]
# 0.615 1 True [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1]
# 0.643 1 True [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1]
# 0.667 1 True [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1]
# 0.688 1 False [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1]
# 0.647 1 True [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0]
# 0.667 1 False [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1]
# 0.632 1 True [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0]
# 0.650 1 True [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1]
# accuracy: 0.650
You could vary the window size for performance reasons or to favor recent events.
In above example, if we would estimate the the next value by looking only at the last 3 observed values, we could increase our accuracy to 0.7.
Update: Inspired by Narek's answer I added a logistic regression classifier example to the gist.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With