I know how to randomly sample few rows from a pandas data frame. Lets say I had a data frame df, then to get a fraction of rows, I can do :
df_sample = df.sample(frac=0.007)
However what I need is random rows as above AND also random columns from the above data frame.
Df is currently 56Kx8.5k. If I want say 500x1000 where both 500 and 1000 are randomly sampled how to do this?
I think one approach would be do something like
df.columns to get a list of columns names.
Then do some random sampling of the indices of this list of columns and use that random indices to filter out remaining columns?
Generates random samples from each group of a DataFrame object. Generates random samples from each group of a Series object. Generates a random sample from a given 1-D numpy array. If frac > 1, replacement should be set to True .
Pandas sample() is used to generate a sample random row or column from the function caller data frame. Parameters: n: int value, Number of random rows to generate. frac: Float value, Returns (float value * length of data frame values ).
Just call sample
twice, with corresponding axis parameters:
df.sample(n=500).sample(n=1000, axis=1)
For the first one, axis=0 by default. The first sampling samples lines, while the second considers columns.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With