I am trying to fit a distribution to some values. This is my code
from __future__ import print_function
import pandas as pd
import numpy as np
import scipy as sp
import scipy.optimize as opt
import scipy.stats
import matplotlib.pyplot as plt
values = np.random.pareto(1.5, 10000)
loc = values.min()
scale = 1
def cost_function(alpha):
cost = -sp.stats.pareto(alpha, loc=loc, scale=scale).pdf(values)
return cost.sum()
opt_res = opt.fmin(cost_function, 1.5)
alpha_fit_v = sp.stats.pareto.fit(values, floc=loc, fscale=scale)
print('opt_res = ', opt_res,
' alpha_fit_v = ', alpha_fit_v)
I was expecting alpha_fit_v to be equivalent to opt_res but it is not. What's wrong?.
What's wrong?.
np.random.pareto has a different distribution than sp.stats.paretoIt does not make sense to sum inverse probabilities. You need to use the logarithm:
def cost_function(alpha):
cost = -sp.stats.pareto(alpha, loc=loc, scale=scale).logpdf(values)
return cost.sum()
np.random.pareto has a different distribution than sp.stats.paretoThis one is tricky, but you may have noticed that not even sp.stats.pareto.fit returns the correct result. This is because scipy's Pareto distribution cannot fit the data generated by numpy.
import matplpotlib.pyplot as plt
import scipys as sp
import numpy as np
plt.subplot(2, 1, 1)
plt.hist(np.random.pareto(1.5, 10000), 1000) # This is a Lomax or Pareto II distribution
plt.xlim(0, 10)
plt.subplot(2, 1, 2)
plt.hist(sp.stats.pareto.rvs(1.5, size=1000), 1000) # This is a Pareto distribution
plt.xlim(0, 10)

That said, this will work as expected:
values = sp.stats.pareto.rvs(1.5, size=1000)
loc = 0
scale = 1
def cost_function(alpha):
cost = -sp.stats.pareto(alpha, loc=loc, scale=scale).logpdf(values)
return cost.sum()
opt_res = opt.fmin(cost_function, 1.5)
alpha_fit_v = sp.stats.pareto.fit(values, floc=loc, fscale=scale)
print('opt_res = ', opt_res,
' alpha_fit_v = ', alpha_fit_v)
# opt_res = [ 1.49611816] alpha_fit_v = (1.4960937500000013, 0, 1)
According to the documentation numpy.random.pareto does not quite draw from the Pareto distribution:
Draw samples from a Pareto II or Lomax distribution with specified shape.
The Lomax or Pareto II distribution is a shifted Pareto distribution. The classical Pareto distribution can be obtained from the Lomax distribution by adding 1 and multiplying by the scale parameter m (see Notes).
So you have two alternatives if using numpy to generate the data:
loc=-1 for the scipy distribution.values = np.random.pareto(1.5, 10000) + 1 and set loc=0.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With