I know there is an easy, elegant solution to this problem but I am struggling to find it. All I'm trying to do is add a third column to df2 with the corresponding values from df2, based on Date and PN. There may be values in df2 that don't match to df1, and vice versa (fill NaN where there is no match).
df1:
2017-11-01 2017-11-02 2017-11-03
PN
90020 105.0 105.0 105.0
90022 100.0 100.0 100.0
90061 -3.0 -3.0 -3.0
90065 30.0 30.0 30.0
90099 2.0 2.0 2.0
df2:
PN Date
4 90020 2017-11-01
9 90020 2017-11-02
12 90061 2017-11-01
13 90065 2017-11-02
17 94008 2017-11-03
Desired result:
PN Date Value
4 90020 2017-11-01 105.0
9 90020 2017-11-02 105.0
12 90061 2017-11-01 -3.0
13 90065 2017-11-02 30.0
17 94008 2017-11-03 NaN
As mentioned, you can use pd.melt
combined with pd.merge
.
Just remember to reset_index
to promote your index to a column, and then to align column names.
df1 = pd.melt(df.reset_index(), id_vars=df.index.name)\
.rename(columns={'variable': 'Date'})
res = pd.merge(df2, df1, how='left')
# PN Date value
# 0 90020 2017-11-01 105.0
# 1 90020 2017-11-02 105.0
# 2 90061 2017-11-01 -3.0
# 3 90065 2017-11-02 30.0
# 4 94008 2017-11-03 NaN
To maintain df2
index:
res = df2.reset_index()\
.merge(df1, how='left')\
.set_index('index')
# PN Date value
# index
# 4 90020 2017-11-01 105.0
# 9 90020 2017-11-02 105.0
# 12 90061 2017-11-01 -3.0
# 13 90065 2017-11-02 30.0
# 17 94008 2017-11-03 NaN
Another solution would be to create a map of df1 using stack()
and then apply it to the zip of the columns.
m = df1.stack()
df2['value'] = pd.Series(list(zip(df2.PN, df2.Date))).map(m).values
# PN Date value
#index
#4 90020 2017-11-01 105.0
#9 90020 2017-11-02 105.0
#12 90061 2017-11-01 -3.0
#13 90065 2017-11-02 30.0
#17 94008 2017-11-03 NaN
Time comparison:
jpp's melt and merge: 100 loops, best of 3: 4.41 ms per loop
avbr's stack and map: 100 loops, best of 3: 2.97 ms per loop
ScottB's stack, reindex: 100 loops, best of 3: 3.68 ms per loop
ThisGuy's function,listcomprehension: 100 loops, best of 3: 5.79 ms per loop
Full example:
import pandas as pd
data1 = '''\
PN 2017-11-01 2017-11-02 2017-11-03
90020 105.0 105.0 105.0
90022 100.0 100.0 100.0
90061 -3.0 -3.0 -3.0
90065 30.0 30.0 30.0
90099 2.0 2.0 2.0'''
data2 = '''\
index PN Date
4 90020 2017-11-01
9 90020 2017-11-02
12 90061 2017-11-01
13 90065 2017-11-02
17 94008 2017-11-03
'''
df1 = pd.read_csv(pd.compat.StringIO(data1), sep='\s+').set_index('PN')
df2 = pd.read_csv(pd.compat.StringIO(data2), sep='\s+').set_index('index')
m = df1.stack()
df2['value'] = pd.Series(list(zip(df2.PN, df2.Date))).map(m).values
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With