i am trying to join two data frames but cannot get my head around the possibilities Python has to offer.
First dataframe:
ID MODEL REQUESTS ORDERS
1 Golf 123 4
2 Passat 34 5
3 Model 3 500 8
4 M3 5 0
Second dataframe:
MODEL TYPE MAKE
Golf Sedan Volkswagen
M3 Coupe BMW
Model 3 Sedan Tesla
What I want is to add another column in the first dataframe called "make" so that it looks like this:
ID MODEL MAKE REQUESTS ORDERS
1 Golf Volkswagen 123 4
2 Passat Volkswagen 34 5
3 Model 3 Tesla 500 8
4 M3 BMW 5 0
I already looked at merge, join and map but all examples just appended the required information at the end of the dataframe.
The superior alternative to VLOOKUP is INDEX MATCH. While VLOOKUP works fine in most cases, it tends to reveal flaws when you start using it in large and complex sheets. The INDEX MATCH formula is actually two different functions; INDEX and MATCH. array is range of cells or an array constant.
To do this without pandas (and assuming you know the structure of your data + it fits in memory), you can iterate through the csv file and store the results in a dictionary, where you fill the entries where the ID maps to the other information that you want to keep.
DataFrame - lookup() function The lookup() function returns label-based "fancy indexing" function for DataFrame. Given equal-length arrays of row and column labels, return an array of the values corresponding to each (row, col) pair.
XLOOKUP is much more flexible than VLOOKUP, which can look up values only in the leftmost column of a table, and return values from corresponding columns on the right, as we saw in the example above. In contrast, the XLOOKUP model requires simpler steps and can return values for any column in either direction.
I think you can use insert
with map
by Series
created with df2
(if some value in column MODEL
in df2
is missing get NaN
):
df1.insert(2, 'MAKE', df1['MODEL'].map(df2.set_index('MODEL')['MAKE']))
print (df1)
ID MODEL MAKE REQUESTS ORDERS
0 1 Golf Volkswagen 123 4
1 2 Passat NaN 34 5
2 3 Model 3 Tesla 500 8
3 4 M3 BMW 5 0
Although not in this case, but there might be scenarios where df2 has more than two columns and you would just want to add one out of those to df1 based on a specific column as key. Here is a generic code that you may find useful.
df = pd.merge(df1, df2[['MODEL', 'MAKE']], on = 'MODEL', how = 'left')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With