Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to randomly select rows from a data set using pandas?

I have a data set with 36k rows. I want to randomly select 9k rows from it using pandas. How do I accomplish this task?

like image 774
Niranjan Agnihotri Avatar asked Mar 28 '17 06:03

Niranjan Agnihotri


1 Answers

I think you can use sample - 9k or 25% rows:

df.sample(n=9000)

Or:

df.sample(frac=0.25)

Another solution with creating random sample of index by numpy.random.choice and then select by loc - index has to be unique:

df = df.loc[np.random.choice(df.index, size=9000)]

Solution if not unique index:

df = df.iloc[np.random.choice(np.arange(len(df)), size=9000)]
like image 198
jezrael Avatar answered Sep 28 '22 00:09

jezrael