I have a dataframe df: <pre class="prettyprint"><code>id name count 1 a 10 2 b 20 3 c 30 4 d 40 5 e 50 </code></pre> Here I have another dataframe df2: <pre class="prettyprint"><code>id1 price rating 1 100 1.0 2 200 2.0 3 300 3.0 5 500 5.0 </code></pre> I want to join these two dataframes on column id and id1(both refer same). Here is an example of df3: <pre class="prettyprint"><code>id name count price rating 1 a 10 100 1.0 2 b 20 200 2.0 3 c 30 300 3.0 4 d 40 Nan Nan 5 e 50 500 5.0 </code></pre> Should I use df.merge or pd.concat?

Use <code>merge</code>: <pre class="prettyprint"><code>print (pd.merge(df1, df2, left_on='id', right_on='id1', how='left').drop('id1', axis=1)) id name count price rating 0 1 a 10 100.0 1.0 1 2 b 20 200.0 2.0 2 3 c 30 300.0 3.0 3 4 d 40 NaN NaN 4 5 e 50 500.0 5.0 </code></pre> Another solution is simple rename column: <pre class="prettyprint"><code>print (pd.merge(df1, df2.rename(columns={'id1':'id'}), on='id', how='left')) id name count price rating 0 1 a 10 100.0 1.0 1 2 b 20 200.0 2.0 2 3 c 30 300.0 3.0 3 4 d 40 NaN NaN 4 5 e 50 500.0 5.0 </code></pre> If need only column <code>price</code> the simpliest is <code>map</code>: <pre class="prettyprint"><code>df1['price'] = df1.id.map(df2.set_index('id1')['price']) print (df1) id name count price 0 1 a 10 100.0 1 2 b 20 200.0 2 3 c 30 300.0 3 4 d 40 NaN 4 5 e 50 500.0 </code></pre> Another 2 solutions: <pre class="prettyprint"><code>print (pd.merge(df1, df2, left_on='id', right_on='id1', how='left') .drop(['id1', 'rating'], axis=1)) id name count price 0 1 a 10 100.0 1 2 b 20 200.0 2 3 c 30 300.0 3 4 d 40 NaN 4 5 e 50 500.0 </code></pre> <hr> <pre class="prettyprint"><code>print (pd.merge(df1, df2[['id1','price']], left_on='id', right_on='id1', how='left') .drop('id1', axis=1)) id name count price 0 1 a 10 100.0 1 2 b 20 200.0 2 3 c 30 300.0 3 4 d 40 NaN 4 5 e 50 500.0 </code></pre>

<code>join</code> utilizes the index to merge on unless we specify a column to use instead. However, we can only specify a column instead of the index for the <code>'left'</code> dataframe. Strategy: <ul> <li> <code>set_index</code> on <code>df2</code> to be <code>id1</code> </li> <li>use <code>join</code> with <code>df</code> as the left dataframe and <code>id</code> as the <code>on</code> parameter. Note that I could have <code>set_index('id')</code> on <code>df</code> to avoid having to use the <code>on</code> parameter. However, this allowed me leave the column in the dataframe rather than having to reset_index later.</li> </ul> <hr> <pre class="prettyprint"><code>df.join(df2.set_index('id1'), on='id') id name count price rating 0 1 a 10 100.0 1.0 1 2 b 20 200.0 2.0 2 3 c 30 300.0 3.0 3 4 d 40 NaN NaN 4 5 e 50 500.0 5.0 </code></pre> <hr> If you only want <code>price</code> from <code>df2</code> <pre class="prettyprint"><code>df.join(df2.set_index('id1')[['price']], on='id') id name count price 0 1 a 10 100.0 1 2 b 20 200.0 2 3 c 30 300.0 3 4 d 40 NaN 4 5 e 50 500.0 </code></pre>

JOIN two dataframes on common column in python

Tags:

python

join

pandas

I have a dataframe df:

id   name   count 1    a       10 2    b       20 3    c       30 4    d       40 5    e       50

Here I have another dataframe df2:

id1  price   rating  1     100     1.0  2     200     2.0  3     300     3.0  5     500     5.0

I want to join these two dataframes on column id and id1(both refer same). Here is an example of df3:

id   name   count   price   rating 1    a       10      100      1.0 2    b       20      200      2.0 3    c       30      300      3.0 4    d       40      Nan      Nan 5    e       50      500      5.0

Should I use df.merge or pd.concat?

673

asked Jan 04 '17 11:01

Shubham R

2 Answers

Use merge:

print (pd.merge(df1, df2, left_on='id', right_on='id1', how='left').drop('id1', axis=1))    id name  count  price  rating 0   1    a     10  100.0     1.0 1   2    b     20  200.0     2.0 2   3    c     30  300.0     3.0 3   4    d     40    NaN     NaN 4   5    e     50  500.0     5.0

Another solution is simple rename column:

print (pd.merge(df1, df2.rename(columns={'id1':'id'}), on='id',  how='left'))    id name  count  price  rating 0   1    a     10  100.0     1.0 1   2    b     20  200.0     2.0 2   3    c     30  300.0     3.0 3   4    d     40    NaN     NaN 4   5    e     50  500.0     5.0

If need only column price the simpliest is map:

df1['price'] = df1.id.map(df2.set_index('id1')['price']) print (df1)    id name  count  price 0   1    a     10  100.0 1   2    b     20  200.0 2   3    c     30  300.0 3   4    d     40    NaN 4   5    e     50  500.0

Another 2 solutions:

print (pd.merge(df1, df2, left_on='id', right_on='id1', how='left')          .drop(['id1', 'rating'], axis=1))    id name  count  price 0   1    a     10  100.0 1   2    b     20  200.0 2   3    c     30  300.0 3   4    d     40    NaN 4   5    e     50  500.0

print (pd.merge(df1, df2[['id1','price']], left_on='id', right_on='id1', how='left')          .drop('id1', axis=1))    id name  count  price 0   1    a     10  100.0 1   2    b     20  200.0 2   3    c     30  300.0 3   4    d     40    NaN 4   5    e     50  500.0

answered Sep 22 '22 21:09

jezrael

join utilizes the index to merge on unless we specify a column to use instead. However, we can only specify a column instead of the index for the 'left' dataframe.

Strategy:

set_index on df2 to be id1
use join with df as the left dataframe and id as the on parameter. Note that I could have set_index('id') on df to avoid having to use the on parameter. However, this allowed me leave the column in the dataframe rather than having to reset_index later.

df.join(df2.set_index('id1'), on='id')     id name  count  price  rating 0   1    a     10  100.0     1.0 1   2    b     20  200.0     2.0 2   3    c     30  300.0     3.0 3   4    d     40    NaN     NaN 4   5    e     50  500.0     5.0

If you only want price from df2

df.join(df2.set_index('id1')[['price']], on='id')      id name  count  price 0   1    a     10  100.0 1   2    b     20  200.0 2   3    c     30  300.0 3   4    d     40    NaN 4   5    e     50  500.0

answered Sep 18 '22 21:09

piRSquared

Related questions
                            
                                How can I use f-string with a variable, not with a string literal?
                            
                                Short Python alphanumeric hash with minimal collisions
                            
                                How to reset cursor to the beginning of the same line in Python
                            
                                Is there a Python module to open SPSS files?
                            
                                Getting Errno 9: Bad file descriptor in python socket
                            
                                python pandas not reading first column from csv file
                            
                                C# equivalent of rotating a list using python slice operation
                            
                                How do I loop through **kwargs in Python?
                            
                                Python requests - Exception Type: ConnectionError - try: except does not work
                            
                                Difference between super() and calling superclass directly
                            
                                Meaning of X = X[:, 1] in Python
                            
                                Cannot resolve 'django.utils.log.NullHandler' in Django 1.9+
                            
                                Add alpha to an existing matplotlib colormap
                            
                                Random Sample of a subset of a dataframe in Pandas
                            
                                What is as_index in groupby in pandas?
                            
                                selenium.common.exceptions.SessionNotCreatedException: Message: Unable to find a matching set of capabilities with Firefox 46 through Selenium
                            
                                What is the internal precision of numpy.float128?
                            
                                TypeError: 'bool' object is not callable
                            
                                Python based asynchronous workflow modules : What is difference between celery workflow and luigi workflow?
                            
                                jinja2 load template from string: TypeError: no loader for this environment specified

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With