I have a Pandas DataFrame with customer refund reasons. It contains these example data rows: <pre class="prettyprint"><code> **case_type** **claim_type** 1 service service 2 service service 3 chargeback service 4 chargeback local_charges 5 service supplier_service 6 chargeback service 7 chargeback service 8 chargeback service 9 chargeback service 10 chargeback service 11 service service_not_used 12 service service_not_used </code></pre> I would like to compare the customer's reason with some sort of labeled reason. This is no problem, but I would also like to see the total number of records in a specific group (customer reason). <pre class="prettyprint"><code>case_claim_type = df[["case_type", "claim_type"]] case_claim_type.groupby(by=("case_type", "claim_type"))["case_type"].count() </code></pre> Which gives me this output, for example: <pre class="prettyprint"><code>**case_type** **claim_type** service service 2 supplier_service 1 service_not_used 2 chargeback service 6 local_charges 1 </code></pre> I would also like to have have the sum of the output per case_type. Something like: <pre class="prettyprint"><code>**case_type** **claim_type** service service 2 supplier_service 1 service_not_used 2 total: 5 chargeback service 6 local_charges 1 total: 7 </code></pre> It doesn't necessarily has to be in this last output format, a column with the (aggregated) totals per case_type is also fine.

Where: <pre class="prettyprint"><code>df = pd.DataFrame({'case_type':['Service']*20+['chargeback']*9,'claim_type':['service']*5+['local_charges']*5+['service_not_used']*5+['supplier_service']*5+['service']*8+['local_charges']}) df_out = df.groupby(by=("case_type", "claim_type"))["case_type"].count() </code></pre> Let use <code>pd.concat</code>, <code>sum</code> with level parameter, and <code>assign</code>: <pre class="prettyprint"><code>(pd.concat([df_out.to_frame(), df_out.sum(level=0).to_frame() .assign(claim_type= "total") .set_index('claim_type', append=True)]) .sort_index()) </code></pre> Output: <pre class="prettyprint"><code> case_type case_type claim_type Service local_charges 5 service 5 service_not_used 5 supplier_service 5 total 20 chargeback local_charges 1 service 8 total 9 </code></pre>

Pandas groupby and sum total of group

Tags:

python

python-3.x

pandas

dataframe

pandas-groupby

I have a Pandas DataFrame with customer refund reasons. It contains these example data rows:

    **case_type**       **claim_type**
1   service             service
2   service             service
3   chargeback          service
4   chargeback          local_charges
5   service             supplier_service
6   chargeback          service
7   chargeback          service
8   chargeback          service
9   chargeback          service
10  chargeback          service
11  service             service_not_used
12  service             service_not_used

I would like to compare the customer's reason with some sort of labeled reason. This is no problem, but I would also like to see the total number of records in a specific group (customer reason).

case_claim_type = df[["case_type", "claim_type"]]
case_claim_type.groupby(by=("case_type", "claim_type"))["case_type"].count()

Which gives me this output, for example:

**case_type**     **claim_type**                 
service           service                         2
                  supplier_service                1
                  service_not_used                2
chargeback        service                         6
                  local_charges                   1

I would also like to have have the sum of the output per case_type. Something like:

**case_type**     **claim_type**                 
service           service                         2
                  supplier_service                1
                  service_not_used                2
                  total:                          5
chargeback        service                         6
                  local_charges                   1
                  total:                          7

It doesn't necessarily has to be in this last output format, a column with the (aggregated) totals per case_type is also fine.

645

asked Feb 20 '18 15:02

eppe2000

1 Answers

Where:

df = pd.DataFrame({'case_type':['Service']*20+['chargeback']*9,'claim_type':['service']*5+['local_charges']*5+['service_not_used']*5+['supplier_service']*5+['service']*8+['local_charges']})

df_out = df.groupby(by=("case_type", "claim_type"))["case_type"].count()

Let use pd.concat, sum with level parameter, and assign:

(pd.concat([df_out.to_frame(),
           df_out.sum(level=0).to_frame()
                 .assign(claim_type= "total")
                 .set_index('claim_type', append=True)])
  .sort_index())

Output:

                             case_type
case_type  claim_type                 
Service    local_charges             5
           service                   5
           service_not_used          5
           supplier_service          5
           total                    20
chargeback local_charges             1
           service                   8
           total                     9

124

answered Oct 22 '22 03:10

Scott Boston

Related questions
                            
                                Group by Year and Month Panda Pivot Table
                            
                                Get counts by group using pandas [duplicate]
                            
                                Access logs from docker container
                            
                                Comparing dates in python, == works but <= produces error
                            
                                PyQt Fading a QLabel
                            
                                Concatenate string to the end of all elements of a list in python
                            
                                my Keras model does not predict negative values
                            
                                Django: relation "django_site" does not exist in app with psql using sites framework
                            
                                Recursion Depth Exceeded, pickle and BeautifulSoup
                            
                                Import _tkinter or tkinter?
                            
                                How to see Python executable output in a cmd window?
                            
                                Numpy ndarray shape with 3 parameters
                            
                                ThreadPoolExecutor with context manager
                            
                                How to preserve the datatype while iterating dataframe in pandas?
                            
                                Dask dataframes: reading multiple files & storing filename in column
                            
                                Collapse Dataframe Pivot to Single Row
                            
                                Python conditional joining of *consecutive* strings that don't end in punctuation with those that do
                            
                                Find maximum value of time in list containing tuples of time in format ('hour', 'min', 'AM/PM')
                            
                                How to add a table in django app models from PostgreSQL?
                            
                                Passing argument in groupby.agg with multiple functions

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With