Hi I wanted to generate some random numbers with pareto distribution. I've found that this is possible using numpy. But I don't know hot to shape the outcome. For example i want to have results in range: 10-20, but how can i achieve this?
I know the syntax for using pareto from numpy
numpy.random.pareto(m, s)
I can't understand what m is for (I've been looking in wikipedia, but i don't understand one bit)? I know that s i size of generated tuple.
The documentation seems to have a mistake which might be confusing you.
Normally the parameter names in the call signature:
numpy.random.pareto(a, size=None)
Match the parameter names with the given details:
Parameters
----------
shape : float, > 0.
Shape of the distribution.
size : tuple of ints
Output shape. If the given shape is, e.g., ``(m, n, k)``, then
``m * n * k`` samples are drawn.
But you see that the first parameter is called both a
and shape
. Pass your desired shape as the first argument to the function to get a distribution of size
numbers (they're not a tuple
, but a numpy array
).
If you need to change the second parameter (called xm on wikipedia), then just add it to all values, as in the example from the docs:
Examples
--------
Draw samples from the distribution:
>>> a, m = 3., 1. # shape and mode
>>> s = np.random.pareto(a, 1000) + m
So, it is trivial to implement a lower bound: just use your lower bound for m
:
lower = 10 # the lower bound for your values
shape = 1 # the distribution shape parameter, also known as `a` or `alpha`
size = 1000 # the size of your sample (number of random values)
And create the distribution with the lower bound:
x = np.random.pareto(shape, size) + lower
However, the Pareto distribution is not bounded from above, so if you try to cut it off it will really be a truncated version of the distribution, which is not quite the same thing, so be careful. If the shape parameter much bigger than 1, the distribution decays algebraically, as x – (a+1), so you won't see very many large values anyway.
If you choose to implement the upper bound, a simple way is to generate the ordinary sample then remove any values that exceed your limit:
upper = 20
x = x[x<upper] # only values where x < upper
But now the size of your sample is (possibly) smaller. You could keep adding new ones (and filtering out the values that are too large) until the size is what you want, but it would be simpler to make it sufficiently large in the first place, then use only size
of them:
x = np.random.pareto(shape, size*5/4) + lower
x = x[x<upper][:size]
@askewchan Is the document changed?
According to the latest doc, m
should be used like this
a, m = 3., 2. # shape and mode
s = (np.random.pareto(a) + 1) * m
where a
is the shape, and m
is the scale (which is (xm) in Wikipedia).
This is the test code, the expected mean equals to the simulation result.
a = 2
m = 10
def subtask_service_time():
return (numpy.random.pareto(a) + 1) * m
print('Simulation mean:', sum([subtask_service_time() for _ in range(1000)]) / 1000)
print('Excepted mean:', a * m / (a - 1))
>>>>Simulation mean: 20.383399962437686
>>>>Excepted mean: 20.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With