I have dataset of student's scores for each subject.
StuID Subject Scores 1 Math 90 1 Geo 80 2 Math 70 2 Geo 60 3 Math 50 3 Geo 90
Now I want to count the range of scores for each subject like 0< x <=20, 20< x <=30 and get a dataframe like this:
Subject 0-20 20-40 40-60 60-80 80-100 Math 0 0 1 1 1 Geo 0 0 0 1 2
The given dataset is just a sample of the data I am working on. My dataset has more than 1000 line. How can I do it? Thank you!
Assuming a pandas DataFrame df:
import pandas as pd
bins = list(range(0, 100+1, 20))
# [0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
labels = [f'{a}-{b}' for a,b in zip(bins, bins[1:])]
# ['0-10', '10-20', '20-30', '30-40', '40-50', '50-60', '60-70', '70-80', '80-90', '90-100']
out = (pd.crosstab(df['Subject'], pd.cut(df['Scores'],
bins=bins, labels=labels,
ordered=True, right=False))
.reindex(labels, axis=1, fill_value=0)
# .reset_index().rename_axis(columns=None) # optional
)
output:
Scores 0-20 20-40 40-60 60-80 80-100
Subject
Geo 0 0 0 1 2
Math 0 0 1 1 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With