I have a string with 50ish elements, I need to randomize this and generate a much longer string, I found <code>random.sample()</code> to only pick unique elements, which is great but not fit for my purpose, is there a way to allow repetitions in Python or do I need to manyally build a cycle?

You can use <code>numpy.random.choice</code>. It has an argument to specify how many samples you want, and an argument to specify whether you want replacement. Something like the following should work. <pre class="prettyprint"><code>import numpy as np choices = np.random.choice([1, 2, 3], size=10, replace=True) # array([2, 1, 2, 3, 3, 1, 2, 2, 3, 2]) </code></pre> If your input is a string, say something like <code>my_string = 'abc'</code>, you can use: <pre class="prettyprint"><code>choices = np.random.choice([char for char in my_string], size=10, replace=True) # array(['c', 'b', 'b', 'c', 'b', 'a', 'a', 'a', 'c', 'c'], dtype='<U1') </code></pre> Then get a new string out of it with: <pre class="prettyprint"><code>new_string = ''.join(choices) # 'cbbcbaaacc' </code></pre> <h3>Performance</h3> Timing the three answers so far and <code>random.choices</code> from the comments (skipping the <code>''.join</code> part since we all used it) producing 1000 samples from the string <code>'abc'</code>, we get: <ul> <li> <code>numpy.random.choice([char for char in 'abc'], size=1000, replace=True)</code>: <blockquote> 34.1 µs ± 213 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) </blockquote> </li> <li> <code>random.choices('abc', k=1000)</code> <blockquote> 269 µs ± 4.27 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) </blockquote> </li> <li> <code>[random.choice('abc') for _ in range(1000)]</code>: <blockquote> 924 µs ± 10.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) </blockquote> </li> <li> <code>[random.sample('abc',1)[0] for _ in range(1000)]</code>: <blockquote> 4.32 ms ± 67.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) </blockquote> </li> </ul> Numpy is fastest by far. If you put the <code>''.join</code> parts in there, you actually see numpy and <code>random.choices</code> neck and neck, with both being three times faster than the next fastest for this example.

Sampling with repetition in Python

Tags:

python

I have a string with 50ish elements, I need to randomize this and generate a much longer string, I found random.sample() to only pick unique elements, which is great but not fit for my purpose, is there a way to allow repetitions in Python or do I need to manyally build a cycle?

566

asked May 08 '18 20:05

S. W. G.

1 Answers

You can use numpy.random.choice. It has an argument to specify how many samples you want, and an argument to specify whether you want replacement. Something like the following should work.

import numpy as np
choices = np.random.choice([1, 2, 3], size=10, replace=True)
# array([2, 1, 2, 3, 3, 1, 2, 2, 3, 2])

If your input is a string, say something like my_string = 'abc', you can use:

choices = np.random.choice([char for char in my_string], size=10, replace=True)
# array(['c', 'b', 'b', 'c', 'b', 'a', 'a', 'a', 'c', 'c'], dtype='<U1')

Then get a new string out of it with:

new_string = ''.join(choices)
# 'cbbcbaaacc'

Performance

Timing the three answers so far and random.choices from the comments (skipping the ''.join part since we all used it) producing 1000 samples from the string 'abc', we get:

numpy.random.choice([char for char in 'abc'], size=1000, replace=True):

34.1 µs ± 213 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
random.choices('abc', k=1000)

269 µs ± 4.27 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
[random.choice('abc') for _ in range(1000)]:

924 µs ± 10.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
[random.sample('abc',1)[0] for _ in range(1000)]:

4.32 ms ± 67.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Numpy is fastest by far. If you put the ''.join parts in there, you actually see numpy and random.choices neck and neck, with both being three times faster than the next fastest for this example.

176

answered Nov 02 '22 18:11

Engineero

Related questions
                            
                                Bokeh- datetime x_range: 'ValueError, Unrecognized range input'
                            
                                Cython: Compile a Standalone Static Executable
                            
                                Repeat last column in numpy array
                            
                                Explode column of list to multiple rows
                            
                                Accessing '.pickle' file in Google Colab
                            
                                chmod 777 to python script
                            
                                User model other than AUTH_USER_MODEL in Django REST Framework
                            
                                Python all points on circle given radius and center
                            
                                Confusing behavior of np.random.multivariate_normal
                            
                                Is there an easy way to have "checkpoints" in an extended python script?
                            
                                how to install pydot & graphviz on google colab?
                            
                                How to generate a Hash or checksum value on Python Dataframe (created from a fixed width file)?
                            
                                Undo "Install Certificates.command"
                            
                                How to efficiently fillna(0) if series is all-nan, or else remaining non-nan entries are zero?
                            
                                Is there a limit to plotting markers with folium?
                            
                                python is operator behaviour with string [duplicate]
                            
                                WebDriverException: Message: 'chromedriver' executable needs to be in PATH while setting UserAgent through Selenium Chromedriver python
                            
                                strptime() argument 1 must be str, not Series time series convert
                            
                                ValueError: Shape must be rank 2 but is rank 1 for 'MatMul' (op: 'MatMul') with input shapes: [2], [2,3]
                            
                                Connecting Keras models / replacing input but keeping layers

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With