Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between the random.choices() and random.sample() functions?

Tags:

I have the following list: list = [1,1,2,2].

After applying the sample method (rd.sample(list, 3)) the, output is [1, 1, 2].

After applying the choices method (rd.choices(list, 3)), the output is: [2, 1, 2].

What is the difference between these two methods? When should one be preferred over the other?

like image 590
Ganesh M Avatar asked Jan 16 '20 06:01

Ganesh M


People also ask

What is the use of random choice ()?

Definition and Usage The choice() method returns a randomly selected element from the specified sequence. The sequence can be a string, a range, a list, a tuple or any other kind of sequence.

What does random sample do in Python?

Python's random module provides a sample() function for random sampling, randomly picking more than one element from the list without repeating elements. It returns a list of unique items chosen randomly from the list, sequence, or set. We call it random sampling without replacement.


1 Answers

The fundamental difference is that random.choices() will (eventually) draw elements at the same position (always sample from the entire sequence, so, once drawn, the elements are replaced - with replacement), while random.sample() will not (once elements are picked, they are removed from the population to sample, so, once drawn the elements are not replaced - without replacement).

Note that here replaced (replacement) should be understood as placed back (placement back) and not as a synonym of substituted (and substitution).

To better understand it, let's consider the following example:

import random   random.seed(0)   ll = list(range(10))  print(random.sample(ll, 10)) # [6, 9, 0, 2, 4, 3, 5, 1, 8, 7]  print(random.choices(ll, k=10)) # [5, 9, 5, 2, 7, 6, 2, 9, 9, 8] 

As you can see, random.sample() does not produce repeating elements, while random.choices() does.

In your example, both methods have repeating values because you have repeating values in the original sequence, but, in the case of random.sample() those repeating values must come from different positions of the original input.

Eventually, you cannot sample() more than the size of the input sequence, while this is not an issue with choices():

# print(random.sample(ll, 20)) # ValueError: Sample larger than population or is negative   print(random.choices(ll, k=20)) # [9, 3, 7, 8, 6, 4, 1, 4, 6, 9, 9, 4, 8, 2, 8, 5, 0, 7, 3, 8] 

A more generic and theoretical discussion of the sampling process can be found on Wikipedia.

like image 84
norok2 Avatar answered Sep 19 '22 13:09

norok2