Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas division of two columns with groupby

This is obviously simple, but as a pandas newbe I'm getting stuck.

I have a CSV file that contains 3 columns, the State, bene_1_count, and bene_2_count.

I want to calculate the ratio of 'bene_1_count' and 'bene_2_count' in a given state.

df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
           'bene_1_count': [np.random.randint(10000, 99999)
                     for _ in range(12)],
           'bene_2_count': [np.random.randint(10000, 99999)
                     for _ in range(12)]})

I am trying the following, but it is giving me an error: 'No objects to concatenate'

df['ratio'] = df.groupby(['state']).agg(df['bene_1_count']/df['bene_2_count'])

I am not able to figure out how to "reach up" to the state level of the groupby to take the ratio of columns.

I want the ratio of columns w.r.t a state, like I want my output as follows:

    State       ratio

    CA  
    WA  
    CO  
    AZ  
like image 720
Sanjeev Avatar asked Feb 04 '17 23:02

Sanjeev


People also ask

How do I divide two columns in pandas?

The simple division (/) operator is the first way to divide two columns. You will split the First Column with the other columns here. This is the simplest method of dividing two columns in Pandas.

How do you split a groupby in pandas?

Step 1: split the data into groups by creating a groupby object from the original DataFrame; Step 2: apply a function, in this case, an aggregation function that computes a summary statistic (you can also transform or filter your data in this step); Step 3: combine the results into a new DataFrame.

Can you use groupby with multiple columns in pandas?

Grouping by Multiple ColumnsYou can do this by passing a list of column names to groupby instead of a single string value.

How do you do division in pandas?

In the pandas series constructor, the div() or divide() method is used to perform element-wise floating division operation between the two series objects or between a series and a scalar. The method returns a series with resultant floating division values.


1 Answers

Alternatively, stated: You can create custom functions that accept a dataframe. The groupby will return sub-dataframes. You can then use the apply function to apply your custom function to each sub-dataframe.

df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
           'bene_1_count': [np.random.randint(10000, 99999)
                     for _ in range(12)],
           'bene_2_count': [np.random.randint(10000, 99999)
                     for _ in range(12)]})

def divide_two_cols(df_sub):
    return df_sub['bene_1_count'].sum() / float(df_sub['bene_2_count'].sum())

df.groupby('state').apply(divide_two_cols)

Now say you want each row to be divided by the sum of each group (e.g., the total sum of AZ) and also retain all the original columns. Just adjust the above function (change the calculation and return the whole sub dataframe):

def divide_two_cols(df_sub):
    df_sub['divs'] = df_sub['bene_1_count'] / float(df_sub['bene_2_count'].sum())
    return df_sub

df.groupby('state').apply(divide_two_cols)
like image 67
ansonw Avatar answered Sep 21 '22 15:09

ansonw