I want to make a random sample selection in python from the following df such that at least 65% of the resulting sample should have color yellow and cumulative sum of the quantities selected to be less than or equals to 18.
Original Dataset:
Date Id color qty
02-03-2018 A red 5
03-03-2018 B blue 2
03-03-2018 C green 3
04-03-2018 D yellow 4
04-03-2018 E yellow 7
04-03-2018 G yellow 6
04-03-2018 H orange 8
05-03-2018 I yellow 1
06-03-2018 J yellow 5
I have got total qty. selected condition covered but stuck on how to move forward with integrating the % condition:
df2 = df1.sample(n=df1.shape[0])
df3= df2[df2.qty.cumsum() <= 18]
Required dataset:
Date Id color qty
03-03-2018 B blue 2
04-03-2018 D yellow 4
04-03-2018 G yellow 6
06-03-2018 J yellow 5
Or something like this:
Date Id color qty
02-03-2018 A red 5
04-03-2018 D yellow 4
04-03-2018 E yellow 7
05-03-2018 I yellow 1
Any help would be really appreciated!
Thanks in advance.
Filter rows with 'yellow' and select a random sample of at least 65% of your total sample size
import random
yellow_size = float(random.randint(65,100)) / 100
df_yellow = df3[df3['color'] == 'yellow'].sample(yellow_size*sample_size)
Filter rows with other colors and select a random sample for the remaining of your sample size.
others_size = 1 - yellow_size
df_others = df3[df3['color'] != 'yellow].sample(others_size*sample_size)
Combine them both and shuffle the rows.
df_sample = pd.concat([df_yellow, df_others]).sample(frac=1)
UPDATE:
If you want to check for both conditions simultaneously, this could be one way to do it:
import random
df_sample = df
while sum(df_sample['qty']) > 18:
yellow_size = float(random.randint(65,100)) / 100
df_yellow = df[df['color'] == 'yellow'].sample(yellow_size*sample_size)
others_size = 1 - yellow_size
df_others = df[df['color'] != 'yellow'].sample(others_size*sample_size)
df_sample = pd.concat([df_yellow, df_others]).sample(frac=1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With