I have a Pandas DataFrame with customer refund reasons. It contains these example data rows:
**case_type** **claim_type**
1 service service
2 service service
3 chargeback service
4 chargeback local_charges
5 service supplier_service
6 chargeback service
7 chargeback service
8 chargeback service
9 chargeback service
10 chargeback service
11 service service_not_used
12 service service_not_used
I would like to compare the customer's reason with some sort of labeled reason. This is no problem, but I would also like to see the total number of records in a specific group (customer reason).
case_claim_type = df[["case_type", "claim_type"]]
case_claim_type.groupby(by=("case_type", "claim_type"))["case_type"].count()
Which gives me this output, for example:
**case_type** **claim_type**
service service 2
supplier_service 1
service_not_used 2
chargeback service 6
local_charges 1
I would also like to have have the sum of the output per case_type. Something like:
**case_type** **claim_type**
service service 2
supplier_service 1
service_not_used 2
total: 5
chargeback service 6
local_charges 1
total: 7
It doesn't necessarily has to be in this last output format, a column with the (aggregated) totals per case_type is also fine.
To create a new column for the output of groupby. sum(), we will first apply the groupby. sim() operation and then we will store this result in a new column.
Use count() by Column Name Use pandas DataFrame. groupby() to group the rows by column and use count() method to get the count for each group by ignoring None and Nan values. It works with non-floating type data as well.
Pandas DataFrame count() MethodThe count() method counts the number of not empty values for each row, or column if you specify the axis parameter as axis='columns' , and returns a Series object with the result for each row (or column).
Where:
df = pd.DataFrame({'case_type':['Service']*20+['chargeback']*9,'claim_type':['service']*5+['local_charges']*5+['service_not_used']*5+['supplier_service']*5+['service']*8+['local_charges']})
df_out = df.groupby(by=("case_type", "claim_type"))["case_type"].count()
Let use pd.concat
, sum
with level parameter, and assign
:
(pd.concat([df_out.to_frame(),
df_out.sum(level=0).to_frame()
.assign(claim_type= "total")
.set_index('claim_type', append=True)])
.sort_index())
Output:
case_type
case_type claim_type
Service local_charges 5
service 5
service_not_used 5
supplier_service 5
total 20
chargeback local_charges 1
service 8
total 9
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With