Pandas division of two columns with groupby

Tags:

This is obviously simple, but as a pandas newbe I'm getting stuck.

I have a CSV file that contains 3 columns, the State, bene_1_count, and bene_2_count.

I want to calculate the ratio of 'bene_1_count' and 'bene_2_count' in a given state.

df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
           'bene_1_count': [np.random.randint(10000, 99999)
                     for _ in range(12)],
           'bene_2_count': [np.random.randint(10000, 99999)
                     for _ in range(12)]})

I am trying the following, but it is giving me an error: 'No objects to concatenate'

df['ratio'] = df.groupby(['state']).agg(df['bene_1_count']/df['bene_2_count'])

I am not able to figure out how to "reach up" to the state level of the groupby to take the ratio of columns.

I want the ratio of columns w.r.t a state, like I want my output as follows:

    State       ratio

    CA  
    WA  
    CO  
    AZ

720

asked Feb 04 '17 23:02

Sanjeev

1 Answers

Alternatively, stated: You can create custom functions that accept a dataframe. The groupby will return sub-dataframes. You can then use the apply function to apply your custom function to each sub-dataframe.

df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
           'bene_1_count': [np.random.randint(10000, 99999)
                     for _ in range(12)],
           'bene_2_count': [np.random.randint(10000, 99999)
                     for _ in range(12)]})

def divide_two_cols(df_sub):
    return df_sub['bene_1_count'].sum() / float(df_sub['bene_2_count'].sum())

df.groupby('state').apply(divide_two_cols)

Now say you want each row to be divided by the sum of each group (e.g., the total sum of AZ) and also retain all the original columns. Just adjust the above function (change the calculation and return the whole sub dataframe):

def divide_two_cols(df_sub):
    df_sub['divs'] = df_sub['bene_1_count'] / float(df_sub['bene_2_count'].sum())
    return df_sub

df.groupby('state').apply(divide_two_cols)

answered Sep 21 '22 15:09

ansonw

Related questions
                            
                                How to connect to a cluster in Amazon Redshift using SQLAlchemy?
                            
                                =+ Python operator is syntactically correct
                            
                                RuntimeWarning: invalid value encountered in arccos
                            
                                pandas: sorting observations within groupby groups
                            
                                Api key and Django Rest Framework Auth Token
                            
                                Setting default value after initialization in SelectField flask-WTForms
                            
                                Python: how to add a column to a pandas dataframe between two columns?
                            
                                Lowercasing script in Python vs Perl
                            
                                VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
                            
                                build a DataFrame with columns from tuple of arrays
                            
                                Is it possible to kill the parent thread from within a child thread in python?
                            
                                python thrift error ```TSocket read 0 bytes```
                            
                                Sum of several columns from a pandas dataframe
                            
                                Rank mismatch: Rank of labels (received 2) should equal rank of logits minus 1 (received 2)
                            
                                Jacobian and Hessian inputs in `scipy.optimize.minimize`
                            
                                How to create a pygame surface from a numpy array of float32?
                            
                                How to adjust subplot size in seaborn?
                            
                                How to find what matched in any() with Python?
                            
                                delete rows based on a condition in pandas
                            
                                How to create vector of symbolic variables in sympy

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas division of two columns with groupby

Tags:

python

python-3.x

pandas

Sanjeev

People also ask

1 Answers

ansonw

Recent Activity

Donate For Us