i am trying to join two data frames but cannot get my head around the possibilities Python has to offer. First dataframe: <pre class="prettyprint"><code>ID MODEL REQUESTS ORDERS 1 Golf 123 4 2 Passat 34 5 3 Model 3 500 8 4 M3 5 0 </code></pre> Second dataframe: <pre class="prettyprint"><code>MODEL TYPE MAKE Golf Sedan Volkswagen M3 Coupe BMW Model 3 Sedan Tesla </code></pre> What I want is to add another column in the first dataframe called "make" so that it looks like this: <pre class="prettyprint"><code>ID MODEL MAKE REQUESTS ORDERS 1 Golf Volkswagen 123 4 2 Passat Volkswagen 34 5 3 Model 3 Tesla 500 8 4 M3 BMW 5 0 </code></pre> I already looked at merge, join and map but all examples just appended the required information at the end of the dataframe.

I think you can use <code>insert</code> with <code>map</code> by <code>Series</code> created with <code>df2</code> (if some value in column <code>MODEL</code> in <code>df2</code> is missing get <code>NaN</code>): <pre class="prettyprint"><code>df1.insert(2, 'MAKE', df1['MODEL'].map(df2.set_index('MODEL')['MAKE'])) print (df1) ID MODEL MAKE REQUESTS ORDERS 0 1 Golf Volkswagen 123 4 1 2 Passat NaN 34 5 2 3 Model 3 Tesla 500 8 3 4 M3 BMW 5 0 </code></pre>

Python - function similar to VLOOKUP (Excel)

Tags:

python

pandas

dataframe

vlookup

i am trying to join two data frames but cannot get my head around the possibilities Python has to offer.

First dataframe:

ID MODEL   REQUESTS ORDERS
1  Golf    123      4
2  Passat  34       5
3  Model 3 500      8
4  M3      5        0

Second dataframe:

MODEL   TYPE  MAKE
Golf    Sedan Volkswagen
M3      Coupe BMW
Model 3 Sedan Tesla

What I want is to add another column in the first dataframe called "make" so that it looks like this:

ID MODEL   MAKE       REQUESTS ORDERS
1  Golf    Volkswagen 123      4
2  Passat  Volkswagen 34       5
3  Model 3 Tesla      500      8
4  M3      BMW        5        0

I already looked at merge, join and map but all examples just appended the required information at the end of the dataframe.

699

asked Jan 06 '17 18:01

Christian

2 Answers

I think you can use insert with map by Series created with df2 (if some value in column MODEL in df2 is missing get NaN):

df1.insert(2, 'MAKE', df1['MODEL'].map(df2.set_index('MODEL')['MAKE']))
print (df1)
   ID    MODEL        MAKE  REQUESTS  ORDERS
0   1     Golf  Volkswagen       123       4
1   2   Passat         NaN        34       5
2   3  Model 3       Tesla       500       8
3   4       M3         BMW         5       0

150

answered Oct 16 '22 02:10

jezrael

Although not in this case, but there might be scenarios where df2 has more than two columns and you would just want to add one out of those to df1 based on a specific column as key. Here is a generic code that you may find useful.

df = pd.merge(df1, df2[['MODEL', 'MAKE']], on = 'MODEL', how = 'left')

answered Oct 16 '22 02:10

Bhagabat Behera

Related questions
                            
                                How to autoimport module in flask-migrate migration
                            
                                Non blocking read on os.pipe on Windows
                            
                                How to create a virtual environment for python 2.7.x?
                            
                                Comprehensions in Python to sample tuples from a list
                            
                                pandas subplot title size in ipython notebook
                            
                                Django create new user without password
                            
                                Most efficient way to construct similarity matrix
                            
                                Class inheritance in python
                            
                                Scale Numpy array to certain range
                            
                                Packages from Conda env not found in Jupyer Notebook
                            
                                python abstractmethod with another baseclass breaks abstract functionality
                            
                                How to repeat the last command on the command-line in the python debugger, PuDB
                            
                                Empty class size in python
                            
                                How to use logging, pytest fixture and capsys?
                            
                                Pandas Dataframe to Seaborn Grouped Barchart
                            
                                Fast Queue of read only numpy arrays
                            
                                How to check in python that at least one of the default parameters of the function specified
                            
                                How to use joblib.Memory of cache the output of a member function of a Python Class
                            
                                Adding multiple recipients using google api in python?
                            
                                What is the equivalent to a Matlab cell array?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With