I am currently merging two dataframes with an outer join. However, after merging, I see all the rows are duplicated even when the columns that I merged upon contain the same values. Specifically, I have the following code. <pre class="prettyprint lang-py prettyprint-override"><code>merged_df = pd.merge(df1, df2, on=['email_address'], how='inner') </code></pre> Here are the two dataframes and the results. <code>df1</code> <pre class="prettyprint lang-none prettyprint-override"><code> email_address name surname 0 john.smith@email.com john smith 1 john.smith@email.com john smith 2 elvis@email.com elvis presley </code></pre> <code>df2</code> <pre class="prettyprint lang-none prettyprint-override"><code> email_address street city 0 john.smith@email.com street1 NY 1 john.smith@email.com street1 NY 2 elvis@email.com street2 LA </code></pre> <code>merged_df</code> <pre class="prettyprint lang-none prettyprint-override"><code> email_address name surname street city 0 john.smith@email.com john smith street1 NY 1 john.smith@email.com john smith street1 NY 2 john.smith@email.com john smith street1 NY 3 john.smith@email.com john smith street1 NY 4 elvis@email.com elvis presley street2 LA 5 elvis@email.com elvis presley street2 LA </code></pre> My question is, shouldn't it be like this? This is how I would like my <code>merged_df</code> to be like. <pre class="prettyprint lang-none prettyprint-override"><code> email_address name surname street city 0 john.smith@email.com john smith street1 NY 1 john.smith@email.com john smith street1 NY 2 elvis@email.com elvis presley street2 LA </code></pre> Are there any ways I can achieve this?

<pre class="prettyprint"><code>list_2_nodups = list_2.drop_duplicates() pd.merge(list_1 , list_2_nodups , on=['email_address']) </code></pre> <h3><img src="https://i.stack.imgur.com/MXtAP.png" alt="enter image description here"></h3> The duplicate rows are expected. Each john smith in <code>list_1</code> matches with each john smith in <code>list_2</code>. I had to drop the duplicates in one of the lists. I chose <code>list_2</code>.

Duplicated rows when merging dataframes in Python

Tags:

python

merge

python-3.x

pandas

python-2.7

I am currently merging two dataframes with an outer join. However, after merging, I see all the rows are duplicated even when the columns that I merged upon contain the same values.

Specifically, I have the following code.

Click to copy

merged_df = pd.merge(df1, df2, on=['email_address'], how='inner')

Here are the two dataframes and the results.

df1

Click to copy

          email_address    name   surname 0  john.smith@email.com    john     smith 1  john.smith@email.com    john     smith 2       elvis@email.com   elvis   presley

df2

Click to copy

          email_address    street  city 0  john.smith@email.com   street1    NY 1  john.smith@email.com   street1    NY 2       elvis@email.com   street2    LA

merged_df

Click to copy

          email_address    name   surname    street  city 0  john.smith@email.com    john     smith   street1    NY 1  john.smith@email.com    john     smith   street1    NY 2  john.smith@email.com    john     smith   street1    NY 3  john.smith@email.com    john     smith   street1    NY 4       elvis@email.com   elvis   presley   street2    LA 5       elvis@email.com   elvis   presley   street2    LA

My question is, shouldn't it be like this?

This is how I would like my merged_df to be like.

Click to copy

          email_address    name   surname    street  city 0  john.smith@email.com    john     smith   street1    NY 1  john.smith@email.com    john     smith   street1    NY 2       elvis@email.com   elvis   presley   street2    LA

Are there any ways I can achieve this?

490

asked Aug 18 '16 13:08

Roberto Bertinetti

1 Answers

Click to copy

list_2_nodups = list_2.drop_duplicates() pd.merge(list_1 , list_2_nodups , on=['email_address'])

The duplicate rows are expected. Each john smith in list_1 matches with each john smith in list_2. I had to drop the duplicates in one of the lists. I chose list_2.

answered Sep 23 '22 18:09

piRSquared

Related questions
                            
                                How to read a raw image using PIL?
                            
                                interpolate 3D volume with numpy and or scipy
                            
                                Python thread name doesn't show up on ps or htop
                            
                                The print of string constant is always attached with 'b' inTensorFlow [duplicate]
                            
                                How do you get Python documentation in Texinfo Info format?
                            
                                Classifying Documents into Categories
                            
                                What good are Python function annotations? [duplicate]
                            
                                What is a correct way to filter different loggers using python logging?
                            
                                How to format IPython html display of Pandas dataframe?
                            
                                Dropping time from datetime <[M8] in Pandas
                            
                                Matplotlib - Plot a plane and points in 3D simultaneously
                            
                                Keras flowFromDirectory get file names as they are being generated
                            
                                Python inheritance - how to call grandparent method?
                            
                                matplotlib Axes.plot() vs pyplot.plot()
                            
                                Python 2.7 not working anymore: cannot import name md5
                            
                                Why use pandas.assign rather than simply initialize new column?
                            
                                Making a python iterator go backwards?
                            
                                Pickle with custom classes
                            
                                What is a unicode string? [closed]
                            
                                Force child class to call parent method when overriding it

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Duplicated rows when merging dataframes in Python

Tags:

python

merge

python-3.x

pandas

python-2.7

Roberto Bertinetti

People also ask

1 Answers

piRSquared

Recent Activity

Donate For Us