how to replicate scipy.stats.fit using optimization function?

Question

I am trying to fit a distribution to some values. This is my code

from __future__ import print_function
import pandas as pd
import numpy as np
import scipy as sp
import scipy.optimize as opt
import scipy.stats
import matplotlib.pyplot as plt

    values = np.random.pareto(1.5, 10000)
    loc = values.min()
    scale = 1


    def cost_function(alpha):
        cost = -sp.stats.pareto(alpha, loc=loc, scale=scale).pdf(values)
        return cost.sum()

    opt_res = opt.fmin(cost_function, 1.5)

    alpha_fit_v = sp.stats.pareto.fit(values, floc=loc, fscale=scale)


    print('opt_res = ', opt_res,
          ' alpha_fit_v = ', alpha_fit_v)

I was expecting alpha_fit_v to be equivalent to opt_res but it is not. What's wrong?.

MB-F · Accepted Answer

What's wrong?.

The cost function is wrong.
np.random.pareto has a different distribution than sp.stats.pareto

1. The cost function is wrong

It does not make sense to sum inverse probabilities. You need to use the logarithm:

def cost_function(alpha):
    cost = -sp.stats.pareto(alpha, loc=loc, scale=scale).logpdf(values)
    return cost.sum()

2. `np.random.pareto` has a different distribution than `sp.stats.pareto`

This one is tricky, but you may have noticed that not even sp.stats.pareto.fit returns the correct result. This is because scipy's Pareto distribution cannot fit the data generated by numpy.

import matplpotlib.pyplot as plt
import scipys as sp
import numpy as np

plt.subplot(2, 1, 1)
plt.hist(np.random.pareto(1.5, 10000), 1000)  # This is a Lomax or Pareto II distribution
plt.xlim(0, 10)

plt.subplot(2, 1, 2)
plt.hist(sp.stats.pareto.rvs(1.5, size=1000), 1000)  # This is a Pareto distribution
plt.xlim(0, 10)

distributions

That said, this will work as expected:

values = sp.stats.pareto.rvs(1.5, size=1000)
loc = 0
scale = 1

def cost_function(alpha):
    cost = -sp.stats.pareto(alpha, loc=loc, scale=scale).logpdf(values)
    return cost.sum()

opt_res = opt.fmin(cost_function, 1.5)

alpha_fit_v = sp.stats.pareto.fit(values, floc=loc, fscale=scale)

print('opt_res = ', opt_res,
      ' alpha_fit_v = ', alpha_fit_v)

# opt_res =  [ 1.49611816]  alpha_fit_v =  (1.4960937500000013, 0, 1)

According to the documentation numpy.random.pareto does not quite draw from the Pareto distribution:

Draw samples from a Pareto II or Lomax distribution with specified shape.

The Lomax or Pareto II distribution is a shifted Pareto distribution. The classical Pareto distribution can be obtained from the Lomax distribution by adding 1 and multiplying by the scale parameter m (see Notes).

So you have two alternatives if using numpy to generate the data:

You can set loc=-1 for the scipy distribution.
You can do values = np.random.pareto(1.5, 10000) + 1 and set loc=0.

how to replicate scipy.stats.fit using optimization function?

Tags:

python

statistics

scipy

Donbeo

1 Answers

1. The cost function is wrong

2. `np.random.pareto` has a different distribution than `sp.stats.pareto`

MB-F

Recent Activity

Donate For Us

how to replicate scipy.stats.fit using optimization function?

Tags:

python

statistics

scipy

Donbeo

1 Answers

1. The cost function is wrong

2. np.random.pareto has a different distribution than sp.stats.pareto

MB-F

Related questions

Recent Activity

Donate For Us

2. `np.random.pareto` has a different distribution than `sp.stats.pareto`