Implementing a machine learning-like optimizer

Tags:

I am trying to predict the trend of an internet post.

I have available the number of comments and votes the post has after 2 minutes of being posted (can change, but it should be enough).

Currently I use this formula:

predicted_votes = (votes_per_minute + n_comments * 60 * h) * k

And then I find k experimentally. I get the post data, wait an hour, do

k = (older_k + actual_votes/predicted_votes) / 2

And so on. This kind of works. The accuracy is pretty low (40 - 50%), but it gives me a rough idea on how the post is going to react.

I was wondering if I could employ a more complex equation, something like:

predicted_votes = ((votes_per_minute * x + n_comments * y) * 60 * hour) * k # Hour stands for 'how many hours to predict'

And then optimize the parameters to approximate a bit better.

I would assume that I could use Machine Learning, although I don't have a GPU available (that's right, I'm running on integrated graphics, blame Mojave), so I am trying this approach instead.

So the question boils down to, how do I optimize those parameters (k,x,y) to get a better accuracy?

EDIT:

I tried following what @Alexis said, and this is where I am at right now:

import numpy as np
 import matplotlib.pyplot as plt
 from scipy.optimize import curve_fit


 initial_votes_list = [1.41, 0.9, 0.94, 0.47, 0]
 initial_comment_list = [0, 3, 0, 1, 64]

 def func(x, k, t, s):
      votes_per_minute = x[0]
      n_comments = x[1]
      return ((votes_per_minute * t + n_comments * s) * 60) * k



 xdata = [1.41,0]
 y = func(xdata, 2.5, 1.3, 0.5)
 np.random.seed(1729)
 ydata = y + 5
 plt.plot(xdata, ydata, 'b-', label='data')

 popt, pcov = curve_fit(func, xdata, ydata)

 plt.plot(xdata, func(xdata, *popt), 'g--',
          label='fit: a=%5.3f, b=%5.3f, c=%5.3f' % tuple(popt))

 plt.xlabel('Time')
 plt.ylabel('Score')
 plt.legend()
 plt.show()

I am not sure how to feed the data I have (votes_per_minute, n_comments), nor how I could tell the algorithm that y axis is actually time.

EDIT 2:

Tried doing what @Alexis told me, but I am unsure what to use as actual_score, a number doesn't work, a list neither.. Also, I want to predict the 'score' not the number of comments.

import numpy as np
 import matplotlib.pyplot as plt
 from scipy.optimize import curve_fit

 initial_votes_list = [1.41, 0.9, 0.94, 0.47, 0]
 initial_comment_list = [0, 3, 0, 1, 64]

 final_score = [26,12,13,14,229]

 def func(x,k,t,s):
     return ((x[0]*k+x[1]*t)*60*x[2])*s
 X = [[a,b,c] for a,b,c in zip(initial_votes_list,initial_comment_list,[i for i in range(len(initial_votes_list))])]
 y = actual_votes # What is this?

 popt, pcov = curve_fit(func, X, y)

 plt.plot(xdata, func(xdata, *popt), 'g--',
          label='fit: a=%5.3f, b=%5.3f, c=%5.3f' % tuple(popt))

 plt.xlabel('Time')
 plt.ylabel('Score')
 plt.legend()
 plt.show()

993

asked May 16 '19 10:05

G. Ramistella

1 Answers

you don't need ML to do so (overkill i think here). Scipy provides a nice and easy way to fit a curve to the observations you have.

scipy.optimize.curve_fit allows you to fit a function with unknown parameters to your observation. As you already know the general form of the function, optimizing the hyper parameters is a well known stat problem and thus scipy should be enough.

We can take a small example to demonstrate this: first we generate the datas

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> from scipy.optimize import curve_fit
>>>
>>> def func(x, a, b, c):
...     return a * np.exp(-b * x) + c

Define the data to be fit with some noise:

>>> xdata = np.linspace(0, 4, 50)
>>> y = func(xdata, 2.5, 1.3, 0.5)
>>> np.random.seed(1729)
>>> y_noise = 0.2 * np.random.normal(size=xdata.size)
>>> ydata = y + y_noise
>>> plt.plot(xdata, ydata, 'b-', label='data')

then we fit the function (ax+b=y) to the data using scipy:

popt, pcov = curve_fit(func, xdata, ydata)

you could add constraints to this, but for your problem it is not necessary. By the way, this example is at the end of the link i provided. Everything you should know to use the curve fit is available on this page.

Edit

it seems you have a hard time figuring out how to use this. Let's go slowly and analytically to make sure we are ok every step of the way:

you want to predict the number of comment, this is your y. It is known. not calculated
you have in entry three parameters: the votes_per_minute , the n_comments and the hour h
and last but not least, you have three parameters to a function (x,y,k)

so X[i] (one sample) should look like this: [votes_per_minute,n_comments,h] and with your formula y = ((votes_per_minute * k + n_comments * t) * 60 * h) * s, by replacing the names:

def func(x,k,t,s):
    return ((x[0]*k+x[1]*t)*60*x[2])*s
X = np.array([[a,b,c] for a,b,c in zip(initial_votes_list,initial_comment_list,[i for i in range(len(initial_votes_list))])]).T
y = score

and then:

popt, pcov = curve_fit(func, X, y)

(if i understand your issue...if not, i don't see where the problem is)

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

initial_votes_list = [1.41, 0.9, 0.94, 0.47, 0]
initial_comment_list = [0, 3, 0, 1, 64]

final_score = [26,12,13,14,229]

def func(x,k,t,s):
    return ((x[0]*k+x[1]*t)*60*x[2])*s
X = np.array([[a,b,c] for a,b,c in zip(initial_votes_list,initial_comment_list,[i for i in range(len(initial_votes_list))])]).T
y = [0.12,0.20,0.5,0.9,1] 

popt, pcov = curve_fit(func, X, y)



print(popt)
>>>[-6.65969099e+00 -6.99241803e-02 -9.33412000e-04]

100

answered Oct 19 '22 09:10

Frayal

Related questions
                            
                                How to create AlertDialog in androidx.appcompat
                            
                                Why does console log, log out a variable thats already been assigned as the new assignment [duplicate]
                            
                                How to configure webpack to use a prebuilt svg sprite?
                            
                                WebRTC Events in Firefox
                            
                                react-native run-android command stuck on task app:installdebug
                            
                                Forcing Locality on Dask Dataframe Subsets
                            
                                How do I create pipeline variables for a YAML-based pipeline?
                            
                                Creating a custom Sdk in .Net Core
                            
                                How to implement conditional rendering while the children component calling useState() in react hooks?
                            
                                How to build c++ code outside of microsoft visual studio
                            
                                How to fix "error: no matching function for call to" when inheriting twice from a base class
                            
                                How to separate asp.net core mvc project into multiple assembly (.dll)?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With