Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I select a random item from a weighted array in Julia?

Consider two 1-dim arrays, one with items to select from and one containing the probabilities of drawing the item of the other list.

items = ["a", 2, 5, "h", "hello", 3] weights = [0.1, 0.1, 0.2, 0.2, 0.1, 0.3] 

In Julia, how can one randomly select an item in items using weights to weight the probability to drawing a given item?

like image 850
Remi.b Avatar asked Dec 19 '14 04:12

Remi.b


People also ask

How do you get a random vector in Julia?

To randomly permute an arbitrary vector, see shuffle or shuffle! . In Julia 1.1 randperm returns a vector v with eltype(v) == typeof(n) while in Julia 1.0 eltype(v) == Int . Construct in A a random permutation of length length(A) . The optional rng argument specifies a random number generator (see Random Numbers).

What is seed in Julia?

In Julia, you can set a seed to the random number generator using the srand() function. The code example below sets the seed to 1234. Generating a random variable with rand(1) after setting the seed to 1234 will always generate the same number, i.e. it will always return 0.5908446386657102.


2 Answers

Use the StatsBase.jl package, i.e.

Pkg.add("StatsBase")  # Only do this once, obviously using StatsBase items = ["a", 2, 5, "h", "hello", 3] weights = [0.1, 0.1, 0.2, 0.2, 0.1, 0.3] sample(items, Weights(weights)) 

Or if you want to sample many:

# With replacement my_samps = sample(items, Weights(weights), 10) # Without replacement my_samps = sample(items, Weights(weights), 2, replace=false) 

(In Julia < 1.0, Weights was called WeightVec).

You can learn more about Weights and why it exists in the docs. The sampling algorithms in StatsBase are very efficient and designed to use different approaches depending on the size of the input.

like image 148
IainDunning Avatar answered Oct 03 '22 06:10

IainDunning


Here's a much simpler approach which only uses Julia's base library:

sample(items, weights) = items[findfirst(cumsum(weights) .> rand())] 

Example:

>>> sample(["a", 2, 5, "h", "hello", 3], [0.1, 0.1, 0.2, 0.2, 0.1, 0.3]) "h" 

This is less efficient than StatsBase.jl, but for small vectors it's fine.

Also, if weights is not a normalized vector, you need to do: cumsum(weights ./ sum(weights)).

like image 43
Miles Avatar answered Oct 03 '22 07:10

Miles