Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sampling with repetition in Python

Tags:

python

I have a string with 50ish elements, I need to randomize this and generate a much longer string, I found random.sample() to only pick unique elements, which is great but not fit for my purpose, is there a way to allow repetitions in Python or do I need to manyally build a cycle?

like image 566
S. W. G. Avatar asked May 08 '18 20:05

S. W. G.


People also ask

How do you repeat a random sample in Python?

You can use random. randint() and random. randrange() to generate the random numbers, but it can repeat the numbers. To create a list of unique random numbers, we need to use the sample() method.

How do you use sampling in Python?

Systematic Sampling: Python Implementation We generate data that serve as population data as in the previous case. We then create a Python function called systematic_sample() that takes population data and interval for the sampling and produces as output a systematic sample.

How do you take a random sample of data in Python?

Python pandas provides a function, named sample() to perform random sampling. The number of samples to be extracted can be expressed in two alternative ways: specify the exact number of random rows to extract. specify the percentage of random rows to extract.

How do you randomly select a sample?

This can be done in one of two ways: the lottery or random number method. In the lottery method, you choose the sample at random by “drawing from a hat” or by using a computer program that will simulate the same action. In the random number method, you assign every individual a number.


1 Answers

You can use numpy.random.choice. It has an argument to specify how many samples you want, and an argument to specify whether you want replacement. Something like the following should work.

import numpy as np
choices = np.random.choice([1, 2, 3], size=10, replace=True)
# array([2, 1, 2, 3, 3, 1, 2, 2, 3, 2])

If your input is a string, say something like my_string = 'abc', you can use:

choices = np.random.choice([char for char in my_string], size=10, replace=True)
# array(['c', 'b', 'b', 'c', 'b', 'a', 'a', 'a', 'c', 'c'], dtype='<U1')

Then get a new string out of it with:

new_string = ''.join(choices)
# 'cbbcbaaacc'

Performance

Timing the three answers so far and random.choices from the comments (skipping the ''.join part since we all used it) producing 1000 samples from the string 'abc', we get:

  • numpy.random.choice([char for char in 'abc'], size=1000, replace=True):

    34.1 µs ± 213 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

  • random.choices('abc', k=1000)

    269 µs ± 4.27 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

  • [random.choice('abc') for _ in range(1000)]:

    924 µs ± 10.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

  • [random.sample('abc',1)[0] for _ in range(1000)]:

    4.32 ms ± 67.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Numpy is fastest by far. If you put the ''.join parts in there, you actually see numpy and random.choices neck and neck, with both being three times faster than the next fastest for this example.

like image 176
Engineero Avatar answered Nov 02 '22 18:11

Engineero