How I can find the next value? [closed]

Question

Given an array of 0 and 1, e.g. array[] = {0, 1, 0, 0, 0, 1, ...}, how I can predict what the next value will be with the best possible accuracy?

What kind of methods are best suited for this kind of task?

Narek Saribekyan · Accepted Answer

The prediction method would depend on the interpretation of data.

However, it looks like in this particular case we can make some general assumptions that might justify use of certain machine learning techniques.

Values are generated one after another in chronological order
Values depend on some (possibly non-observable) external state. If the state repeats itself, so do the values.

This is a pretty common scenario in many machine learning contexts. One example is the prediction of stock prices based on history.

Now, to build the predictive model you'll need to define the training data set. Assume our model looks at the last k values. In case if k=1, we might end up with something similar to a Markov chain model.

Our training data set will consist of k-dimensional data points together with their respective dependent values. For example, suppose k=3 and we have the following input data

0,0,1,1,0,1,0,1,1,1,1,0,1,0,0,1...

We'll have the following training data:

(0,0,1) -> 1
(0,1,1) -> 0
(1,1,0) -> 1
(1,0,1) -> 0
(0,1,0) -> 1
(1,0,1) -> 1
(0,1,1) -> 1
(1,1,1) -> 1
(1,1,1) -> 0
(1,1,0) -> 1
(1,0,1) -> 0
(0,1,0) -> 0
(1,0,0) -> 1

Now, let's say you want to predict the next value in the sequence. The last 3 values are 0,0,1, so the model must predict the value of the function at (0,0,1), based on the training data.

A popular and relatively simple approach would be to use a multivariate linear regression on a k-dimensional data space. Alternatively, consider using a neural network if linear regression underfits the training data set.

You might need to try out different values of k and test against your validation set.

miku · Answer

You could use a maximum likelihood estimator for the Bernoulli distribution. In essence you would:

look at all observed values and estimate parameter p
then use p to determine the next value

In Python this could look like this:

#!/usr/bin/env python

from __future__ import division

signal = [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0]

def maximum_likelihood(s, last=None):
    """
    The maximum likelihood estimator selects the parameter value which gives
    the observed data the largest possible probability.

    http://mathworld.wolfram.com/MaximumLikelihood.html

    If `last` is given, only use the last `n` values.
    """
    if not last:
        return sum(s) / len(s)
    return sum(s[:-last]) / last

if __name__ == '__main__':
    hits = []

    print('p	predicted	correct	signal')
    print('-	---------	-------	------')

    for i in range(1, len(signal) - 1):
        p = maximum_likelihood(signal[:i]) # p = maximum_likelihood(signal[:i], last=2)
        prediction = int(p >= 0.5)
        hits.append(prediction == signal[i])
        print('%0.3f	%s		%s	%s' % (
            p, prediction, prediction == signal[i], signal[:i]))

    print('accuracy: %0.3f' % (sum(hits) / len(hits)))

The output would like this:

# p       predicted  correct signal
# -       ---------  ------- ------
# 1.000   1          False   [1]
# 0.500   1          True    [1, 0]
# 0.667   1          True    [1, 0, 1]
# 0.750   1          False   [1, 0, 1, 1]
# 0.600   1          False   [1, 0, 1, 1, 0]
# 0.500   1          True    [1, 0, 1, 1, 0, 0]
# 0.571   1          False   [1, 0, 1, 1, 0, 0, 1]
# 0.500   1          True    [1, 0, 1, 1, 0, 0, 1, 0]
# 0.556   1          True    [1, 0, 1, 1, 0, 0, 1, 0, 1]
# 0.600   1          False   [1, 0, 1, 1, 0, 0, 1, 0, 1, 1]
# 0.545   1          True    [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0]
# 0.583   1          True    [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1]
# 0.615   1          True    [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1]
# 0.643   1          True    [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1]
# 0.667   1          True    [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1]
# 0.688   1          False   [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1]
# 0.647   1          True    [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0]
# 0.667   1          False   [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1]
# 0.632   1          True    [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0]
# 0.650   1          True    [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1]
# accuracy: 0.650

You could vary the window size for performance reasons or to favor recent events.

In above example, if we would estimate the the next value by looking only at the last 3 observed values, we could increase our accuracy to 0.7.

Update: Inspired by Narek's answer I added a logistic regression classifier example to the gist.

How I can find the next value? [closed]

Tags:

algorithm

machine-learning

MATH000

2 Answers

Narek Saribekyan

miku

Recent Activity

Donate For Us

How I can find the next value? [closed]

Tags:

algorithm

machine-learning

MATH000

2 Answers

Narek Saribekyan

miku

Related questions

Recent Activity

Donate For Us