Am a newcomer to R and need advice on how to draw random numbers from a limited area of a Pareto Distribution with parameters s & beta. (System: Windows 7, R 2.15.2.) (1) I have data in a vector data$t; each single data point I'll call data&tx For these data the parameters s & beta of a Pareto distribution are estimated following https://stats.stackexchange.com/questions/27426/how-do-i-fit-a-set-of-data-to-a-pareto-distribution-in-r <pre class="prettyprint"><code>pareto.MLE <- function(X) { n <- length(X) m <- min(X) a <- n/sum(log(X)-log(m)) return( c(m,a) ) } </code></pre> (2) Now I need to draw as many random numbers (RndNew) von this Pareto distribution (s, beta, see (1)) as there are observations (= data points: data$tx) . For the draw the area from which random numbers are drawn must be limited to the area where RndNewx >= data$tx; in other words: RndNewx must never be smaller than the corresponding data$tx. Question: how to tell R to restrict the area of a Pareto distribution from which to draw a random number to be RndNewx >= data$tx? Thanks a million for any help!

The standard approach to sampling from a truncated distribution has three steps. Here's an example with the normal distribution so you can get the idea. <pre class="prettyprint"><code>n <- 1000 lower_bound <- -1 upper_bound <- 1 </code></pre> Apply the CDF to your lower and upper bounds to find the quantiles of the edges of your distribution. <pre class="prettyprint"><code>(quantiles <- pnorm(c(lower_bound, upper_bound))) # [1] 0.1586553 0.8413447 </code></pre> Sample from a uniform distribution between those quantiles. <pre class="prettyprint"><code>uniform_random_numbers <- runif(n, quantiles[1], quantiles[2]) </code></pre> Apply the inverse CDF. <pre class="prettyprint"><code>truncated_normal_random_numbers <- qnorm(uniform_random_numbers) </code></pre> <img src="https://i.stack.imgur.com/RMGHB.png" alt="truncated normal random numbers"> <hr> The CDF for the pareto distribution is <pre class="prettyprint"><code>ppareto <- function(x, scale, shape) { ifelse(x > scale, 1 - (scale / x) ^ shape, 0) } </code></pre> And the inverse is <pre class="prettyprint"><code>qpareto <- function(y, scale, shape) { ifelse( y >= 0 & y <= 1, scale * ((1 - y) ^ (-1 / shape)), NaN ) } </code></pre> We can rework the above example to use these Pareto functions. <pre class="prettyprint"><code>n <- 1000 scale <- 1 shape <- 1 lower_bound <- 2 upper_bound <- 10 (quantiles <- ppareto(c(lower_bound, upper_bound), scale, shape)) uniform_random_numbers <- runif(n, quantiles[1], quantiles[2]) truncated_pareto_random_numbers <- qpareto(uniform_random_numbers, scale, shape) </code></pre> <img src="https://i.stack.imgur.com/jPzgO.png" alt="truncated Pareto distrbution"> <hr> To make it easier to reuse this code, we can wrap it into a function. The lower and upper bounds have default values that match the range of the distribution, so if you don't pass values in, then you'll get a non-truncated Pareto distribution. <pre class="prettyprint"><code>rpareto <- function(n, scale, shape, lower_bound = scale, upper_bound = Inf) { quantiles <- ppareto(c(lower_bound, upper_bound), scale, shape) uniform_random_numbers <- runif(n, quantiles[1], quantiles[2]) qpareto(uniform_random_numbers, scale, shape) } </code></pre>

Draw random numbers from restricted Pareto distribution

Tags:

random

r

Am a newcomer to R and need advice on how to draw random numbers from a limited area of a Pareto Distribution with parameters s & beta. (System: Windows 7, R 2.15.2.)

(1) I have data in a vector data$t; each single data point I'll call data&tx

For these data the parameters s & beta of a Pareto distribution are estimated following https://stats.stackexchange.com/questions/27426/how-do-i-fit-a-set-of-data-to-a-pareto-distribution-in-r

pareto.MLE <- function(X)
{
n <- length(X)
m <- min(X)
a <- n/sum(log(X)-log(m))
return( c(m,a) ) 
}

(2) Now I need to draw as many random numbers (RndNew) von this Pareto distribution (s, beta, see (1)) as there are observations (= data points: data$tx) . For the draw the area from which random numbers are drawn must be limited to the area where RndNewx >= data$tx; in other words: RndNewx must never be smaller than the corresponding data$tx.

Question: how to tell R to restrict the area of a Pareto distribution from which to draw a random number to be RndNewx >= data$tx?

Thanks a million for any help!

554

asked Jan 24 '13 08:01

user2006697

1 Answers

The standard approach to sampling from a truncated distribution has three steps. Here's an example with the normal distribution so you can get the idea.

n <- 1000
lower_bound <- -1
upper_bound <- 1

Apply the CDF to your lower and upper bounds to find the quantiles of the edges of your distribution.

(quantiles <- pnorm(c(lower_bound, upper_bound)))
# [1] 0.1586553 0.8413447

Sample from a uniform distribution between those quantiles.

uniform_random_numbers <- runif(n, quantiles[1], quantiles[2])

Apply the inverse CDF.

truncated_normal_random_numbers <- qnorm(uniform_random_numbers)

truncated normal random numbers

The CDF for the pareto distribution is

ppareto <- function(x, scale, shape)
{
  ifelse(x > scale, 1 - (scale / x) ^ shape, 0)
}

And the inverse is

qpareto <- function(y, scale, shape)
{
  ifelse(
    y >= 0 & y <= 1,
    scale * ((1 - y) ^ (-1 / shape)),
    NaN
  )
}

We can rework the above example to use these Pareto functions.

n <- 1000
scale <- 1
shape <- 1
lower_bound <- 2
upper_bound <- 10

(quantiles <- ppareto(c(lower_bound, upper_bound), scale, shape))
uniform_random_numbers <- runif(n, quantiles[1], quantiles[2])    
truncated_pareto_random_numbers <- qpareto(uniform_random_numbers, scale, shape)

truncated Pareto distrbution

To make it easier to reuse this code, we can wrap it into a function. The lower and upper bounds have default values that match the range of the distribution, so if you don't pass values in, then you'll get a non-truncated Pareto distribution.

rpareto <- function(n, scale, shape, lower_bound = scale, upper_bound = Inf)
{
  quantiles <- ppareto(c(lower_bound, upper_bound), scale, shape)
  uniform_random_numbers <- runif(n, quantiles[1], quantiles[2])    
  qpareto(uniform_random_numbers, scale, shape)
}

answered Oct 21 '22 06:10

Richie Cotton

Related questions
                            
                                compare adjacent elements of the same vector (avoiding loops)
                            
                                Is there something like a pmax index?
                            
                                replace .. with . in R
                            
                                Find rows in a data frame where two columns are equal
                            
                                add on.exit expr to parent call?
                            
                                ggplot2: plotting order of factors within a geom
                            
                                lm predict won't predict
                            
                                strsplit one column with exact information into two column
                            
                                R ggplot geom_tile without fill color
                            
                                passing objects to return results in an error
                            
                                Pass subset argument through a function to subset
                            
                                ggplot2: Thresholds for scale_alpha()
                            
                                ggplot vertically justify legend
                            
                                Split vector randomly into two sets
                            
                                Iterate over cartesian product of vectors
                            
                                Converting char to date time
                            
                                Plotting only x and y axis borders (no box) in ggplot2?
                            
                                workflow between Latex and R screwed
                            
                                GLM function in R with log link not working
                            
                                Fastest way to drop rows with missing values?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With