Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementing a machine learning-like optimizer

Tags:

I am trying to predict the trend of an internet post.

I have available the number of comments and votes the post has after 2 minutes of being posted (can change, but it should be enough).

Currently I use this formula:

predicted_votes = (votes_per_minute + n_comments * 60 * h) * k

And then I find k experimentally. I get the post data, wait an hour, do

k = (older_k + actual_votes/predicted_votes) / 2

And so on. This kind of works. The accuracy is pretty low (40 - 50%), but it gives me a rough idea on how the post is going to react.

I was wondering if I could employ a more complex equation, something like:

predicted_votes = ((votes_per_minute * x + n_comments * y) * 60 * hour) * k # Hour stands for 'how many hours to predict'

And then optimize the parameters to approximate a bit better.

I would assume that I could use Machine Learning, although I don't have a GPU available (that's right, I'm running on integrated graphics, blame Mojave), so I am trying this approach instead.

So the question boils down to, how do I optimize those parameters (k,x,y) to get a better accuracy?

EDIT:

I tried following what @Alexis said, and this is where I am at right now:

import numpy as np
 import matplotlib.pyplot as plt
 from scipy.optimize import curve_fit


 initial_votes_list = [1.41, 0.9, 0.94, 0.47, 0]
 initial_comment_list = [0, 3, 0, 1, 64]

 def func(x, k, t, s):
      votes_per_minute = x[0]
      n_comments = x[1]
      return ((votes_per_minute * t + n_comments * s) * 60) * k



 xdata = [1.41,0]
 y = func(xdata, 2.5, 1.3, 0.5)
 np.random.seed(1729)
 ydata = y + 5
 plt.plot(xdata, ydata, 'b-', label='data')

 popt, pcov = curve_fit(func, xdata, ydata)

 plt.plot(xdata, func(xdata, *popt), 'g--',
          label='fit: a=%5.3f, b=%5.3f, c=%5.3f' % tuple(popt))

 plt.xlabel('Time')
 plt.ylabel('Score')
 plt.legend()
 plt.show()

I am not sure how to feed the data I have (votes_per_minute, n_comments), nor how I could tell the algorithm that y axis is actually time.

EDIT 2:

Tried doing what @Alexis told me, but I am unsure what to use as actual_score, a number doesn't work, a list neither.. Also, I want to predict the 'score' not the number of comments.

import numpy as np
 import matplotlib.pyplot as plt
 from scipy.optimize import curve_fit

 initial_votes_list = [1.41, 0.9, 0.94, 0.47, 0]
 initial_comment_list = [0, 3, 0, 1, 64]

 final_score = [26,12,13,14,229]

 def func(x,k,t,s):
     return ((x[0]*k+x[1]*t)*60*x[2])*s
 X = [[a,b,c] for a,b,c in zip(initial_votes_list,initial_comment_list,[i for i in range(len(initial_votes_list))])]
 y = actual_votes # What is this?

 popt, pcov = curve_fit(func, X, y)

 plt.plot(xdata, func(xdata, *popt), 'g--',
          label='fit: a=%5.3f, b=%5.3f, c=%5.3f' % tuple(popt))

 plt.xlabel('Time')
 plt.ylabel('Score')
 plt.legend()
 plt.show()
like image 993
G. Ramistella Avatar asked May 16 '19 10:05

G. Ramistella


People also ask

What is Optimizer in machine learning?

An optimizer is a function or an algorithm that modifies the attributes of the neural network, such as weights and learning rate. Thus, it helps in reducing the overall loss and improve the accuracy.

How machine learning can be used for optimization?

Optimization plays an important part in a machine learning project in addition to fitting the learning algorithm on the training dataset. The step of preparing the data prior to fitting the model and the step of tuning a chosen model also can be framed as an optimization problem.

What are Optimizers?

Optimizers are algorithms or methods used to change the attributes of the neural network such as weights and learning rate to reduce the losses. Optimizers are used to solve optimization problems by minimizing the function.


1 Answers

you don't need ML to do so (overkill i think here). Scipy provides a nice and easy way to fit a curve to the observations you have.

scipy.optimize.curve_fit allows you to fit a function with unknown parameters to your observation. As you already know the general form of the function, optimizing the hyper parameters is a well known stat problem and thus scipy should be enough.

We can take a small example to demonstrate this: first we generate the datas

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> from scipy.optimize import curve_fit
>>>
>>> def func(x, a, b, c):
...     return a * np.exp(-b * x) + c

Define the data to be fit with some noise:

>>> xdata = np.linspace(0, 4, 50)
>>> y = func(xdata, 2.5, 1.3, 0.5)
>>> np.random.seed(1729)
>>> y_noise = 0.2 * np.random.normal(size=xdata.size)
>>> ydata = y + y_noise
>>> plt.plot(xdata, ydata, 'b-', label='data')

then we fit the function (ax+b=y) to the data using scipy:

popt, pcov = curve_fit(func, xdata, ydata)

you could add constraints to this, but for your problem it is not necessary. By the way, this example is at the end of the link i provided. Everything you should know to use the curve fit is available on this page.

Edit

it seems you have a hard time figuring out how to use this. Let's go slowly and analytically to make sure we are ok every step of the way:

  • you want to predict the number of comment, this is your y. It is known. not calculated
  • you have in entry three parameters: the votes_per_minute , the n_comments and the hour h
  • and last but not least, you have three parameters to a function (x,y,k)

so X[i] (one sample) should look like this: [votes_per_minute,n_comments,h] and with your formula y = ((votes_per_minute * k + n_comments * t) * 60 * h) * s, by replacing the names:

def func(x,k,t,s):
    return ((x[0]*k+x[1]*t)*60*x[2])*s
X = np.array([[a,b,c] for a,b,c in zip(initial_votes_list,initial_comment_list,[i for i in range(len(initial_votes_list))])]).T
y = score 

and then:

popt, pcov = curve_fit(func, X, y) 

(if i understand your issue...if not, i don't see where the problem is)

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

initial_votes_list = [1.41, 0.9, 0.94, 0.47, 0]
initial_comment_list = [0, 3, 0, 1, 64]

final_score = [26,12,13,14,229]

def func(x,k,t,s):
    return ((x[0]*k+x[1]*t)*60*x[2])*s
X = np.array([[a,b,c] for a,b,c in zip(initial_votes_list,initial_comment_list,[i for i in range(len(initial_votes_list))])]).T
y = [0.12,0.20,0.5,0.9,1] 

popt, pcov = curve_fit(func, X, y)



print(popt)
>>>[-6.65969099e+00 -6.99241803e-02 -9.33412000e-04]
like image 100
Frayal Avatar answered Oct 19 '22 09:10

Frayal