Select rows randomly based on condition pandas python

Tags:

I have a small test data sample:

import pandas as pd

df = {'ID': ['H900','H901','H902','','M1435','M149','M157','','M699','M920','','M789','M617','M991','H903','M730','M191'],
  'Clone': [0,1,2,2,2,2,2,2,3,3,3,4,4,4,5,5,6],
  'Length': [48,42  ,48,48,48,48,48,48,48,48,48,48,48,48,48,48,48]}

df = pd.DataFrame(df)

it looks like:

df
Out[4]: 
      Clone   ID  Length
0       0   H900      48
1       1   H901      42
2       2   H902      48
3       2             48
4       2  M1435      48
5       2   M149      48
6       2   M157      48
7       2             48
8       3   M699      48
9       3   M920      48
10      3             48
11      4   M789      48
12      4   M617      48
13      4   M991      48
14      5   H903      48
15      5   M730      48
16      6   M191      48

I want a simple script to pick, for example, 5 rows, out randomly but only the rows that contains an ID, it should not include any row that does not contain an ID.

my script:

import pandas as pd
import numpy as np

df = {'ID': ['H900','H901','H902','','M1435','M149','M157','','M699','M920','','M789','M617','M991','H903','M730','M191'],
  'Clone': [0,1,2,2,2,2,2,2,3,3,3,4,4,4,5,5,6],
  'Length': [48,42  ,48,48,48,48,48,48,48,48,48,48,48,48,48,48,48]}

df = pd.DataFrame(df)

rows = np.random.choice(df.index.values, 5)
sampled_df = df.ix[rows]

sampled_df.to_csv('sampled_df.txt', sep = '\t', index=False)

but this script sometimes pick out the rows that does not contain an ID

209

asked Jun 02 '16 13:06

Jessica

2 Answers

I think you need filter empty ID with boolean indexing:

import pandas as pd
import numpy as np

df = {'ID': ['H900','H901','H902','','M1435','M149','M157','','M699','M920','','M789','M617','M991','H903','M730','M191'],
  'Clone': [0,1,2,2,2,2,2,2,3,3,3,4,4,4,5,5,6],
  'Length': [48,42  ,48,48,48,48,48,48,48,48,48,48,48,48,48,48,48]}

df = pd.DataFrame(df)
print (df)
df = df[df.ID != '']

rows = np.random.choice(df.index.values, 5)
sampled_df = df.loc[rows]
print (sampled_df)

answered Oct 07 '22 06:10

jezrael

It is also possible to use query in this case and then sample. Like so:

df = df.query('(ID != "")').sample(n=5)

answered Oct 07 '22 04:10

DataBach

Related questions
                            
                                Python: All possible Timezone Abbreviations for given Timezone Name (and vise versa)
                            
                                Continued Fractions Python [closed]
                            
                                Efficient way to swap bytes in python
                            
                                Convert from UNIX timestamp (with milliseconds) to HH:MM:SS in Python [duplicate]
                            
                                How to change the head size of the double head annotate in matplotlib?
                            
                                Strange thing when Python __setitem__ use multiple key
                            
                                Django : Cannot import modules
                            
                                How many CPU cycles one addition take?
                            
                                Python list basic manipulation [duplicate]
                            
                                What are the difference between sep and end in print function?
                            
                                PySpark - Add a new column with a Rank by User
                            
                                Link C in llvmlite
                            
                                Multiple inputs from one input
                            
                                How to group by and dummies in pandas
                            
                                how to include ssl with python build on MacOS
                            
                                Python testing if my data follows a lognormal distribution
                            
                                How to eliminate all strings from a list
                            
                                How to allow POST method with Flask?
                            
                                List to csv in python with header [closed]
                            
                                How to set cookies in phantomjs using selenium with python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Select rows randomly based on condition pandas python

Tags:

python

random

pandas

Jessica

People also ask

2 Answers

jezrael

DataBach

Recent Activity

Donate For Us