Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dividing data from a large dataframe by data in a smaller dataframe based on indices

I have two data frames. I want to create a new column in the first data frame by dividing by specific data in the second data frame, dependent on a date.

import pandas as pd

data1 = {'Count':  {('2020-02-01','Cat', '0'): 10,
                    ('2020-02-01','Dog', '1'): 7,
                    ('2020-02-01','Cat', '2'): 5,

                    ('2020-02-03','Cat', '0'): 3,
                    ('2020-02-03','Cat', '1'): 4,
                    ('2020-02-03','Dog', '2'): 10,

                    ('2020-02-05','Cat', '0'): 1,
                    ('2020-02-05','Dog', '1'): 20,
                    ('2020-02-05','Cat', '2'): 17,

                    ('2020-02-08','Dog', '0'): 13,
                    ('2020-02-08','Dog', '1'): 13,

                    ('2020-02-09','Dog', '2'): 14,
                    ('2020-02-09','Cat', '0'): 30,
                    ('2020-02-09','Dog', '1'): 12,
                    ('2020-02-09','Cat', '2'): 1}}

df1 = pd.DataFrame(data1)
df1.index.names = ['Date', 'Animal', 'Dose'] 

data2 = {'Average':  {('2020-02-01','0'): 1,
                      ('2020-02-03','0'): 5,
                      ('2020-02-05','0'): 3,
                      ('2020-02-08','0'): 31,
                      ('2020-02-09','0'): 14}}

df2 = pd.DataFrame(data2)
df2.index.names = ['Date', 'Dose']

All Dates exist in both data frames.
How do I divide the values in the 'Count' column of df1 by the 'Average' values in df2 based on 'Date'?

Obviously df1['Divided'] = df1['Date','Count'].div[df2['Date','Average'] won't work, but that's kind of the idea.

I've tried all sorts of things up to and including converting to a dictionary and trying to map values, but I couldn't figure it out.

My expected results would look like this:

data3 = [10/1, 7/1, 5/1, 3/5, 4/5, 10/5, 1/3, 20/3, 17/3, 13/31, 13/31, 14/14, 30/14, 12/14, 1/14]

df1['Divided'] = data3

df1
like image 947
Ubertramp Avatar asked Dec 01 '25 02:12

Ubertramp


1 Answers

This would work:

df1 = df1.reset_index(level=['Animal', 'Dose'])
df2 = df2.reset_index(level=['Dose'])
df1["New_Value"] = df1['Count'].div(df2['Average'])
df1 = df1.reset_index().set_index(['Date', 'Animal', 'Dose'])

Output:

                      Count  New_Value
Date       Animal Dose                  
2020-02-01 Cat    0        10  10.000000
                  2         5   5.000000
           Dog    1         7   7.000000
2020-02-03 Cat    0         3   0.600000
                  1         4   0.800000
           Dog    2        10   2.000000
2020-02-05 Cat    0         1   0.333333
                  2        17   5.666667
           Dog    1        20   6.666667
2020-02-08 Dog    0        13   0.419355
                  1        13   0.419355
2020-02-09 Cat    0        30   2.142857
                  2         1   0.071429
           Dog    1        12   0.857143
                  2        14   1.000000

Hope this is what you are looking for.

like image 109
Raj Srujan Jalem Avatar answered Dec 02 '25 17:12

Raj Srujan Jalem