Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sample on condition with pandas?

I hava a dataframe df like the following:

   Col1      Col2
0  1         T
1  1         B 
2  3         S
3  2         A
4  1         C
5  2         A
etc...

I would like to create two dataframes: df1 is a random sample of 10 rows such that Col2=='T'. df2 is df minus the rows in df1.

like image 421
Bob Avatar asked Sep 20 '15 18:09

Bob


People also ask

How do you sample with pandas?

Python pandas provides a function, named sample() to perform random sampling. The number of samples to be extracted can be expressed in two alternative ways: specify the exact number of random rows to extract. specify the percentage of random rows to extract.


1 Answers

Assuming you have a unique-indexed dataframe (and if you don't, you can simply do .reset_index(), apply this, and then set_index after the fact), you could use DataFrame.sample. [Actually, you should be able to use sample even if the frame didn't have a unique index, but you couldn't use the below method to get df2.]

Note that I'm using A instead of T in this example because A is the only repeated value of Col2 in the example you gave, and I'll only select 1 randomly rather than 10.

>>> df1 = df[df.Col2 == "A"].sample(1)
>>> df2 = df[~df.index.isin(df1.index)]
>>> df1
   Col1 Col2
3     2    A
>>> df2
   Col1 Col2
0     1    T
1     1    B
2     3    S
4     1    C
5     2    A
like image 108
DSM Avatar answered Oct 11 '22 23:10

DSM