Randomly extract x items from a list using python

Tags:

Starting with two lists such as:

lstOne = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
lstTwo = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']

I want to have the user input how many items they want to extract, as a percentage of the overall list length, and the same indices from each list to be randomly extracted. For example say I wanted 50% the output would be

newLstOne = ['8', '1', '3', '7', '5']
newLstTwo = ['8', '1', '3', '7', '5']

I have achieved this using the following code:

from random import randrange

lstOne = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
lstTwo = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']

LengthOfList = len(lstOne)
print LengthOfList

PercentageToUse = input("What Percentage Of Reads Do you want to extract? ")
RangeOfListIndices = []

HowManyIndicesToMake = (float(PercentageToUse)/100)*float(LengthOfList)
print HowManyIndicesToMake

for x in lstOne:
    if len(RangeOfListIndices)==int(HowManyIndicesToMake):
        break
    else:
        random_index = randrange(0,LengthOfList)
        RangeOfListIndices.append(random_index)

print RangeOfListIndices


newlstOne = []
newlstTwo = []

for x in RangeOfListIndices:
    newlstOne.append(lstOne[int(x)])
for x in RangeOfListIndices:
    newlstTwo.append(lstTwo[int(x)])

print newlstOne
print newlstTwo

But I was wondering if there was a more efficient way of doing this, in my actual use case this is subsampling from 145,000 items. Furthermore, is randrange sufficiently free of bias at this scale?

Thank you

422

asked May 04 '14 17:05

PaulBarr

1 Answers

Q. I want to have the user input how many items they want to extract, as a percentage of the overall list length, and the same indices from each list to be randomly extracted.

A. The most straight-forward approach directly matches your specification:

 percentage = float(raw_input('What percentage? '))
 k = len(data) * percentage // 100
 indicies = random.sample(xrange(len(data)), k)
 new_list1 = [list1[i] for i in indicies]
 new_list2 = [list2[i] for i in indicies]

Q. in my actual use case this is subsampling from 145,000 items. Furthermore, is randrange sufficiently free of bias at this scale?

A. In Python 2 and Python 3, the random.randrange() function completely eliminates bias (it uses the internal _randbelow() method that makes multiple random choices until a bias-free result is found).

In Python 2, the random.sample() function is slightly biased but only in the round-off in the last of 53 bits. In Python 3, the random.sample() function uses the internal _randbelow() method and is bias-free.

answered Oct 23 '22 04:10

Raymond Hettinger

Related questions
                            
                                Concatenate Two DataFrames With Hierarchical Columns
                            
                                Dictionary coercion intentional or no?
                            
                                Minor ticks in matplotlib's colorbar
                            
                                Get top biggest values from each column of the pandas.DataFrame
                            
                                difference between adding lists in python with + and += [duplicate]
                            
                                How to avoid inserting duplicate entries when adding values via a sqlalchemy relationship?
                            
                                How can Python Observe Changes to Mongodb's Oplog
                            
                                rauth2 Decoder failed to handle access_token when I tried to connect with Box.com's API
                            
                                Using Numpy creates a tcl folder when using py2exe
                            
                                Python list comprehensions nodejs/javascript [duplicate]
                            
                                Why does a set display in same order if sets are unordered?
                            
                                What is the equivalent of Matlab's surf(x,y,z,c) in matplotlib?
                            
                                How to Verify an Email Address in Python Using smtplib
                            
                                Display kivy slider value as it changes [closed]
                            
                                Difference between iconify() and withdraw() in Python Tkinter
                            
                                Convert HDF5 file to other formats
                            
                                How to filter list based on another list containing wildcards?
                            
                                Selenium send_keys doesn't work if input type="number"
                            
                                Combining __table_args__ with constraints from mixin classes in SQLAlchemy
                            
                                Check if the Main Thread is still alive from another thread

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Randomly extract x items from a list using python

Tags:

python

list

random

python-internals

indices

PaulBarr

People also ask

1 Answers

Raymond Hettinger

Recent Activity

Donate For Us