I am still quite new to joining/merging data in Pandas, so would therefore very much appreciate any help to do the following operation. I have the following three SQL tables (converted to DataFrames) of data:
df1
Out[14]:
---- fruit price qty
2010 apple 1.0 2.0
2011 apple 3.0 4.0
2010 banana 0.5 1.5
2011 banana 7.0 8.0
df2
Out[15]:
---- fruit weight
2010 apple 10
2010 banana 12
df3
Out[16]:
-- fruit colour
0 apple red
1 banana yellow
Where df2 has the same fruits as df1, but not the same years (I'm almost completely sure that df2's years are a subset of df1, although it would be nice to find a method, that allows for years in df2 that aren't included in df1). Df3 is a table with characters for all the fruits contained in df2 and df1. I would like to merge the three tables together, so each row in the new combined DataFrame has year, fruit, price, qty, weight (possibly NaN) and colour. I am not sure if such a data structure would be best contained in a Panel or a DataFrame - inputs on this are also very welcome. Thanks!
To ensure there isn't an issue with the years, I would first reset_index:
In [11]: df1.index.name = 'year'
In [12]: df2.index.name = 'year'
In [13]: df1.reset_index(inplace=True)
In [14]: df2.reset_index(inplace=True)
In [15]: df1
Out[15]:
year fruit price qty
0 2010 apple 1.0 2.0
1 2011 apple 3.0 4.0
2 2010 banana 0.5 1.5
3 2011 banana 7.0 8.0
[4 rows x 4 columns]
In [16]: df2
Out[16]:
year fruit weight
0 2010 apple 10
1 2010 banana 12
[2 rows x 3 columns]
Now you can get your result by merging (twice):
In [17]: df1.merge(df2, how='left').merge(df3, how='left')
Out[17]:
year fruit price qty weight colour
0 2010 apple 1.0 2.0 10 red
1 2011 apple 3.0 4.0 NaN red
2 2010 banana 0.5 1.5 12 yellow
3 2011 banana 7.0 8.0 NaN yellow
[4 rows x 6 columns]
If you were confident that there was only one weight of fruit (i.e. independent of the year) you could just drop the year column from df2:
In [18]: del df2['year']
In [19]: df1.merge(df2, how='left').merge(df3, how='left')
Out[19]:
year fruit price qty weight colour
0 2010 apple 1.0 2.0 10 red
1 2011 apple 3.0 4.0 10 red
2 2010 banana 0.5 1.5 12 yellow
3 2011 banana 7.0 8.0 12 yellow
[4 rows x 6 columns]
Otherwise you could do a groupby and ffill.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With