Python Pandas - Add values from one dataframe to another by matching labels to columns

Question

I know there is an easy, elegant solution to this problem but I am struggling to find it. All I'm trying to do is add a third column to df2 with the corresponding values from df2, based on Date and PN. There may be values in df2 that don't match to df1, and vice versa (fill NaN where there is no match).

df1:

       2017-11-01  2017-11-02  2017-11-03
PN                                                                              
90020       105.0       105.0       105.0
90022       100.0       100.0       100.0 
90061        -3.0        -3.0        -3.0 
90065        30.0        30.0        30.0
90099         2.0         2.0         2.0

df2:

     PN       Date
4   90020 2017-11-01
9   90020 2017-11-02
12  90061 2017-11-01
13  90065 2017-11-02
17  94008 2017-11-03

Desired result:

     PN       Date      Value
4   90020 2017-11-01    105.0
9   90020 2017-11-02    105.0
12  90061 2017-11-01    -3.0
13  90065 2017-11-02    30.0
17  94008 2017-11-03    NaN

jpp · Accepted Answer

As mentioned, you can use pd.melt combined with pd.merge.

Just remember to reset_index to promote your index to a column, and then to align column names.

df1 = pd.melt(df.reset_index(), id_vars=df.index.name)\
        .rename(columns={'variable': 'Date'})

res = pd.merge(df2, df1, how='left')

#       PN        Date  value
# 0  90020  2017-11-01  105.0
# 1  90020  2017-11-02  105.0
# 2  90061  2017-11-01   -3.0
# 3  90065  2017-11-02   30.0
# 4  94008  2017-11-03    NaN

To maintain df2 index:

res = df2.reset_index()\
         .merge(df1, how='left')\
         .set_index('index')

#           PN        Date  value
# index                          
# 4      90020  2017-11-01  105.0
# 9      90020  2017-11-02  105.0
# 12     90061  2017-11-01   -3.0
# 13     90065  2017-11-02   30.0
# 17     94008  2017-11-03    NaN

Anton vBR · Answer

Another solution would be to create a map of df1 using stack() and then apply it to the zip of the columns.

m = df1.stack()
df2['value'] = pd.Series(list(zip(df2.PN, df2.Date))).map(m).values

#          PN        Date  value
#index                          
#4      90020  2017-11-01  105.0
#9      90020  2017-11-02  105.0
#12     90061  2017-11-01   -3.0
#13     90065  2017-11-02   30.0
#17     94008  2017-11-03    NaN

Time comparison:

jpp's melt and merge: 100 loops, best of 3: 4.41 ms per loop

avbr's stack and map: 100 loops, best of 3: 2.97 ms per loop

ScottB's stack, reindex: 100 loops, best of 3: 3.68 ms per loop

ThisGuy's function,listcomprehension: 100 loops, best of 3: 5.79 ms per loop

Full example:

import pandas as pd

data1 = '''\
PN         2017-11-01  2017-11-02  2017-11-03
90020       105.0       105.0       105.0
90022       100.0       100.0       100.0
90061        -3.0        -3.0        -3.0 
90065        30.0        30.0        30.0
90099         2.0         2.0         2.0'''

data2 = '''\
index     PN       Date
4   90020 2017-11-01
9   90020 2017-11-02
12  90061 2017-11-01
13  90065 2017-11-02
17  94008 2017-11-03
'''

df1 = pd.read_csv(pd.compat.StringIO(data1), sep='\s+').set_index('PN')
df2 = pd.read_csv(pd.compat.StringIO(data2), sep='\s+').set_index('index')

m = df1.stack()
df2['value'] = pd.Series(list(zip(df2.PN, df2.Date))).map(m).values

Python Pandas - Add values from one dataframe to another by matching labels to columns

Tags:

python

pandas

dataframe

bcal

2 Answers

jpp

Anton vBR

Recent Activity

Donate For Us

Python Pandas - Add values from one dataframe to another by matching labels to columns

Tags:

python

pandas

dataframe

bcal

2 Answers

jpp

Anton vBR

Related questions

Recent Activity

Donate For Us