Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Random Sampling of Pandas data frame (both rows and columns)

I know how to randomly sample few rows from a pandas data frame. Lets say I had a data frame df, then to get a fraction of rows, I can do :

df_sample = df.sample(frac=0.007)

However what I need is random rows as above AND also random columns from the above data frame.

Df is currently 56Kx8.5k. If I want say 500x1000 where both 500 and 1000 are randomly sampled how to do this?

I think one approach would be do something like

df.columns to get a list of columns names.

Then do some random sampling of the indices of this list of columns and use that random indices to filter out remaining columns?

like image 734
Baktaawar Avatar asked Jun 28 '16 22:06

Baktaawar


People also ask

Is DataFrame sample random?

Generates random samples from each group of a DataFrame object. Generates random samples from each group of a Series object. Generates a random sample from a given 1-D numpy array. If frac > 1, replacement should be set to True .

How do you create a column sampling in Python?

Pandas sample() is used to generate a sample random row or column from the function caller data frame. Parameters: n: int value, Number of random rows to generate. frac: Float value, Returns (float value * length of data frame values ).


1 Answers

Just call sample twice, with corresponding axis parameters:

df.sample(n=500).sample(n=1000, axis=1)

For the first one, axis=0 by default. The first sampling samples lines, while the second considers columns.

like image 101
ayhan Avatar answered Oct 08 '22 10:10

ayhan