Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas - Add values from one dataframe to another by matching labels to columns

I know there is an easy, elegant solution to this problem but I am struggling to find it. All I'm trying to do is add a third column to df2 with the corresponding values from df2, based on Date and PN. There may be values in df2 that don't match to df1, and vice versa (fill NaN where there is no match).

df1:

       2017-11-01  2017-11-02  2017-11-03
PN                                                                              
90020       105.0       105.0       105.0
90022       100.0       100.0       100.0 
90061        -3.0        -3.0        -3.0 
90065        30.0        30.0        30.0
90099         2.0         2.0         2.0

df2:

     PN       Date
4   90020 2017-11-01
9   90020 2017-11-02
12  90061 2017-11-01
13  90065 2017-11-02
17  94008 2017-11-03

Desired result:

     PN       Date      Value
4   90020 2017-11-01    105.0
9   90020 2017-11-02    105.0
12  90061 2017-11-01    -3.0
13  90065 2017-11-02    30.0
17  94008 2017-11-03    NaN
like image 729
bcal Avatar asked Apr 09 '18 16:04

bcal


2 Answers

As mentioned, you can use pd.melt combined with pd.merge.

Just remember to reset_index to promote your index to a column, and then to align column names.

df1 = pd.melt(df.reset_index(), id_vars=df.index.name)\
        .rename(columns={'variable': 'Date'})

res = pd.merge(df2, df1, how='left')

#       PN        Date  value
# 0  90020  2017-11-01  105.0
# 1  90020  2017-11-02  105.0
# 2  90061  2017-11-01   -3.0
# 3  90065  2017-11-02   30.0
# 4  94008  2017-11-03    NaN

To maintain df2 index:

res = df2.reset_index()\
         .merge(df1, how='left')\
         .set_index('index')

#           PN        Date  value
# index                          
# 4      90020  2017-11-01  105.0
# 9      90020  2017-11-02  105.0
# 12     90061  2017-11-01   -3.0
# 13     90065  2017-11-02   30.0
# 17     94008  2017-11-03    NaN
like image 122
jpp Avatar answered Oct 01 '22 06:10

jpp


Another solution would be to create a map of df1 using stack() and then apply it to the zip of the columns.

m = df1.stack()
df2['value'] = pd.Series(list(zip(df2.PN, df2.Date))).map(m).values

#          PN        Date  value
#index                          
#4      90020  2017-11-01  105.0
#9      90020  2017-11-02  105.0
#12     90061  2017-11-01   -3.0
#13     90065  2017-11-02   30.0
#17     94008  2017-11-03    NaN

Time comparison:

jpp's melt and merge: 100 loops, best of 3: 4.41 ms per loop

avbr's stack and map: 100 loops, best of 3: 2.97 ms per loop

ScottB's stack, reindex: 100 loops, best of 3: 3.68 ms per loop

ThisGuy's function,listcomprehension: 100 loops, best of 3: 5.79 ms per loop


Full example:

import pandas as pd

data1 = '''\
PN         2017-11-01  2017-11-02  2017-11-03
90020       105.0       105.0       105.0
90022       100.0       100.0       100.0
90061        -3.0        -3.0        -3.0 
90065        30.0        30.0        30.0
90099         2.0         2.0         2.0'''

data2 = '''\
index     PN       Date
4   90020 2017-11-01
9   90020 2017-11-02
12  90061 2017-11-01
13  90065 2017-11-02
17  94008 2017-11-03
'''

df1 = pd.read_csv(pd.compat.StringIO(data1), sep='\s+').set_index('PN')
df2 = pd.read_csv(pd.compat.StringIO(data2), sep='\s+').set_index('index')

m = df1.stack()
df2['value'] = pd.Series(list(zip(df2.PN, df2.Date))).map(m).values
like image 34
Anton vBR Avatar answered Oct 01 '22 07:10

Anton vBR