I have a string with 50ish elements, I need to randomize this and generate a much longer string, I found random.sample()
to only pick unique elements, which is great but not fit for my purpose, is there a way to allow repetitions in Python or do I need to manyally build a cycle?
You can use random. randint() and random. randrange() to generate the random numbers, but it can repeat the numbers. To create a list of unique random numbers, we need to use the sample() method.
Systematic Sampling: Python Implementation We generate data that serve as population data as in the previous case. We then create a Python function called systematic_sample() that takes population data and interval for the sampling and produces as output a systematic sample.
Python pandas provides a function, named sample() to perform random sampling. The number of samples to be extracted can be expressed in two alternative ways: specify the exact number of random rows to extract. specify the percentage of random rows to extract.
This can be done in one of two ways: the lottery or random number method. In the lottery method, you choose the sample at random by “drawing from a hat” or by using a computer program that will simulate the same action. In the random number method, you assign every individual a number.
You can use numpy.random.choice
. It has an argument to specify how many samples you want, and an argument to specify whether you want replacement. Something like the following should work.
import numpy as np
choices = np.random.choice([1, 2, 3], size=10, replace=True)
# array([2, 1, 2, 3, 3, 1, 2, 2, 3, 2])
If your input is a string, say something like my_string = 'abc'
, you can use:
choices = np.random.choice([char for char in my_string], size=10, replace=True)
# array(['c', 'b', 'b', 'c', 'b', 'a', 'a', 'a', 'c', 'c'], dtype='<U1')
Then get a new string out of it with:
new_string = ''.join(choices)
# 'cbbcbaaacc'
Timing the three answers so far and random.choices
from the comments (skipping the ''.join
part since we all used it) producing 1000 samples from the string 'abc'
, we get:
numpy.random.choice([char for char in 'abc'], size=1000, replace=True)
:
34.1 µs ± 213 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
random.choices('abc', k=1000)
269 µs ± 4.27 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
[random.choice('abc') for _ in range(1000)]
:
924 µs ± 10.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
[random.sample('abc',1)[0] for _ in range(1000)]
:
4.32 ms ± 67.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Numpy is fastest by far. If you put the ''.join
parts in there, you actually see numpy and random.choices
neck and neck, with both being three times faster than the next fastest for this example.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With