Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sampling dataframe based on quantile (pandas)

Tags:

python

pandas

I have a data frame that I want to sample based on an argument num_samples. I want to uniformly sample based on Age across quantiles.

For example, if my dataframe has 1000 rows and num_samples = .5 I would need to sample 500 rows, but 125 from each quantile.

The first few records of my dataframe looks like this:

Age  x1 x2 x3
12   1  1  2
45   2  1  3
67   4  1  2
11   3  4  10
18   9  7  6
45   3  5  8
78   8  4  7
64   6  2  3
33   3  2  2

How can I do this in python/pandas?

like image 622
Eisen Avatar asked Nov 15 '25 19:11

Eisen


1 Answers

Create a column quantile which has bin for the Age1. Then use boolean masking and resample to sample from each bin, use pd.concat to concat the samples obtained for each bin.

labels = ['q1', 'q2', 'q3', 'q4']
df['quantile'] = pd.qcut(df.Age, q = 4, labels = labels)

out = pd.concat([df[df['quantile'].eq(label)].sample(1) for label in labels])

Prints:

>>> out
   Age  x1  x2  x3 quantile
4   18   9   7   6       q1
8   33   3   2   2       q2
7   64   6   2   3       q3
2   67   4   1   2       q4

P.S. For sampling n samples, change sample(1) to sample(n).

like image 68
Amit Vikram Singh Avatar answered Nov 18 '25 08:11

Amit Vikram Singh



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!