Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How I can find the next value? [closed]

Given an array of 0 and 1, e.g. array[] = {0, 1, 0, 0, 0, 1, ...}, how I can predict what the next value will be with the best possible accuracy?

What kind of methods are best suited for this kind of task?

like image 399
MATH000 Avatar asked Jan 06 '23 14:01

MATH000


2 Answers

The prediction method would depend on the interpretation of data.

However, it looks like in this particular case we can make some general assumptions that might justify use of certain machine learning techniques.

  1. Values are generated one after another in chronological order
  2. Values depend on some (possibly non-observable) external state. If the state repeats itself, so do the values.

This is a pretty common scenario in many machine learning contexts. One example is the prediction of stock prices based on history.

Now, to build the predictive model you'll need to define the training data set. Assume our model looks at the last k values. In case if k=1, we might end up with something similar to a Markov chain model.

Our training data set will consist of k-dimensional data points together with their respective dependent values. For example, suppose k=3 and we have the following input data

0,0,1,1,0,1,0,1,1,1,1,0,1,0,0,1...

We'll have the following training data:

(0,0,1) -> 1
(0,1,1) -> 0
(1,1,0) -> 1
(1,0,1) -> 0
(0,1,0) -> 1
(1,0,1) -> 1
(0,1,1) -> 1
(1,1,1) -> 1
(1,1,1) -> 0
(1,1,0) -> 1
(1,0,1) -> 0
(0,1,0) -> 0
(1,0,0) -> 1

Now, let's say you want to predict the next value in the sequence. The last 3 values are 0,0,1, so the model must predict the value of the function at (0,0,1), based on the training data.

A popular and relatively simple approach would be to use a multivariate linear regression on a k-dimensional data space. Alternatively, consider using a neural network if linear regression underfits the training data set.

You might need to try out different values of k and test against your validation set.

like image 51
Narek Saribekyan Avatar answered Jan 08 '23 04:01

Narek Saribekyan


You could use a maximum likelihood estimator for the Bernoulli distribution. In essence you would:

  • look at all observed values and estimate parameter p
  • then use p to determine the next value

In Python this could look like this:

#!/usr/bin/env python

from __future__ import division

signal = [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0]

def maximum_likelihood(s, last=None):
    """
    The maximum likelihood estimator selects the parameter value which gives
    the observed data the largest possible probability.

    http://mathworld.wolfram.com/MaximumLikelihood.html

    If `last` is given, only use the last `n` values.
    """
    if not last:
        return sum(s) / len(s)
    return sum(s[:-last]) / last

if __name__ == '__main__':
    hits = []

    print('p\tpredicted\tcorrect\tsignal')
    print('-\t---------\t-------\t------')

    for i in range(1, len(signal) - 1):
        p = maximum_likelihood(signal[:i]) # p = maximum_likelihood(signal[:i], last=2)
        prediction = int(p >= 0.5)
        hits.append(prediction == signal[i])
        print('%0.3f\t%s\t\t%s\t%s' % (
            p, prediction, prediction == signal[i], signal[:i]))

    print('accuracy: %0.3f' % (sum(hits) / len(hits)))

The output would like this:

# p       predicted  correct signal
# -       ---------  ------- ------
# 1.000   1          False   [1]
# 0.500   1          True    [1, 0]
# 0.667   1          True    [1, 0, 1]
# 0.750   1          False   [1, 0, 1, 1]
# 0.600   1          False   [1, 0, 1, 1, 0]
# 0.500   1          True    [1, 0, 1, 1, 0, 0]
# 0.571   1          False   [1, 0, 1, 1, 0, 0, 1]
# 0.500   1          True    [1, 0, 1, 1, 0, 0, 1, 0]
# 0.556   1          True    [1, 0, 1, 1, 0, 0, 1, 0, 1]
# 0.600   1          False   [1, 0, 1, 1, 0, 0, 1, 0, 1, 1]
# 0.545   1          True    [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0]
# 0.583   1          True    [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1]
# 0.615   1          True    [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1]
# 0.643   1          True    [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1]
# 0.667   1          True    [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1]
# 0.688   1          False   [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1]
# 0.647   1          True    [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0]
# 0.667   1          False   [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1]
# 0.632   1          True    [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0]
# 0.650   1          True    [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1]
# accuracy: 0.650

You could vary the window size for performance reasons or to favor recent events.

In above example, if we would estimate the the next value by looking only at the last 3 observed values, we could increase our accuracy to 0.7.

Update: Inspired by Narek's answer I added a logistic regression classifier example to the gist.

like image 33
miku Avatar answered Jan 08 '23 05:01

miku