I have two dataframes of differing index lengths that look like this:
df_1:
State Month Total Time ... N columns
AL 4 1000
AL 5 500
.
.
.
VA 11 750
VA 12 1500
df_2:
State Month ... N columns
AL 4
AL 5
.
.
.
VA 11
VA 12
I would like to add a Total Time column to df_2 that uses the values from the Total Time column of df_1 if the State and Month value are the same between dataframes. Ultimately, I would end up with:
df_2:
State Month Total Time ... N columns
AL 4 1000
AL 5 500
.
.
.
VA 11 750
VA 12 1500
I want to do this conditionally since the index lengths are not the same. I have tried this so far:
def f(row):
if (row['State'] == row['State']) & (row['Month'] == row['Month']):
val = x for x in df_1["Total Time"]
return val
df_2['Total Time'] = df_2.apply(f, axis=1)
This did not work. What method would you use to accomplish this? Any help is appreciated!
You can do this:
Consider my sample dataframes:
In [2327]: df_1
Out[2327]:
State Month Total Time
0 AL 2 1000
1 AB 4 500
2 BC 1 600
In [2328]: df_2
Out[2328]:
State Month
0 AL 2
1 AB 5
In [2329]: df_2 = pd.merge(df_2, df_1, on=['State', 'Month'], how='left')
In [2330]: df_2
Out[2330]:
State Month Total Time
0 AL 2 1000.0
1 AB 5 NaN
As mentioned in other comment, pd.merge() is how you would join two dataframes and extract a column. The issue is that merging solely on 'State' and 'Month' would result in every permutation becoming a new column (all Al-4 in df_1 would join with all other AL-4 in df_2).
With your example, there'd be
df_1
State Month Total Time df_1 col...
0 AL 4 1000 6
1 AL 4 500 7
2 VA 12 750 8
3 VA 12 1500 9
df_2
State Month df_2 col...
0 AL 4 1
1 AL 4 2
2 VA 12 3
3 VA 12 4
df_1_cols_to_use = ['State', 'Month', 'Total Time']
# note the selection of the column to use from df_1. We only want the column
# we're merging on, plus the column(s) we want to bring in, in this case 'Total Time'
new_df = pd.merge(df_2, df_1[df_1_cols_to_use], on=['State', 'Month'], how='left')
new_df:
State Month df_2 col... Total Time
0 AL 4 1 1000
1 AL 4 1 500
2 AL 4 2 1000
3 AL 4 2 500
4 VA 12 3 750
5 VA 12 3 1500
6 VA 12 4 750
7 VA 12 4 1500
You mention these have differing index lengths. Based on the parameters of the question, it's not possible to determine what value of Total Time would match up with a row in df_2. If there's three AL-4 entries in df_2, do they each get 1000, 500, or some combination? That info would be needed. Without this, this would be the best guess at getting all possibilities.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With