I have two dataframes. One shows student test results by class on two tests
import pandas as pd
results = pd.DataFrame({
'id':[1,2,3],
'class':[1,1,2],
'test_1':[0.67,0.88,0.33],
'test_2':[0.76,0.63,0.78]})
results
id class test_1 test_2 0 1 1 0.67 0.76 1 2 1 0.88 0.63 2 3 2 0.33 0.78
The other shows quantiles by class and test based on previous semesters
quantiles = pd.DataFrame({'class':[1,2],
'test_1_0.25':[0.23,0.31],
'test_1_0.5':[0.54,0.67],
'test_1_0.75':[0.8,0.9],
'test_2_0.25':[0.23,0.31],
'test_2_0.5':[0.54,0.67],
'test_2_0.75':[0.8,0.9]})
class test_1_0.25 test_1_0.5 test_1_0.75 test_2_0.25 test_2_0.5 \ 0 1 0.23 0.54 0.8 0.23 0.54 1 2 0.31 0.67 0.9 0.31 0.67 test_2_0.75 0 0.8 1 0.9
I would like to return a datarfame that tells me what quantile they place in. 0 if they are below 25, 1 if below 50, 2 if below 75, and 3 if above 75. So the output would look like this
id test_1_quantile test_2_quantile 0 1 2 2 1 2 3 1 2 3 1 2
Any help is much appreciated. Thanks
First DataFrame.merge both DataFrame, then loop be all test values and processing - first DataFrame.filter by same test, add column for test values bellow .25 quantile, set new columns names for output range and compare by DataFrame.lt. Last change order of columns by iloc and get column name of first True value for replace test column:
df = pd.merge(results, quantiles, on='class')
for t in results.columns.difference(['id','class']):
#print (t)
df1 = df.filter(like=t)
df1.insert(1, t + '_0', 0)
df1.columns = [t] + list(range(4))
#print (df1)
a = df1.iloc[:, 1:].lt(df1[t], axis=0).iloc[:, ::-1].idxmax(axis=1)
df[t] = a
print (df[results.columns])
id class test_1 test_2
0 1 1 2 2
1 2 1 3 2
2 3 2 1 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With