I have to dataframes, df1 has columns A, B, C, D... and df2 has columns A, B, E, F... The keys I want to merge with are in column A. B is also (most likely) the same in both dataframes. Though this is a big data set I am working on cleaning so I do not have a extremely good overview of everything yet. I do <pre class="prettyprint"><code>merge(df1, df2, on='A') </code></pre> And the results contains a column called B_x. Since the data set is big and messy I haven't tried to investigate how B_x differs from B in df1 and B in df2 So my question is just in general: what does Pandas mean when it has appended the _x to a column name in the merged dataframe? Thank you

The suffixes are added for any clashes in column names that are not involved in the merge operation, see online docs. So in your case if you think that they are same you could just do the merge on both columns: <pre class="prettyprint"><code>pd.merge(df1, df2, on=['A', 'B']) </code></pre> What this will do though is return only the values where <code>A</code> and <code>B</code> exist in both dataframes as the default merge type is an <code>inner</code> merge. So what you could do is compare this merged df size with your first one and see if they are the same and if so you could do a merge on both columns or just drop/rename the <code>_x</code>/<code>_y</code> suffix <code>B</code> columns. I would spend time though determining if these values are indeed the same and exist in both dataframes, in which case you may wish to perform an <code>outer</code> merge: <pre class="prettyprint"><code>pd.merge(df1, df2, on=['A', 'B'], how='outer') </code></pre> Then what you could do is then drop duplicate rows (and possibly any <code>NaN</code> rows) and that should give you a clean merged dataframe. <pre class="prettyprint"><code>merged_df.drop_duplicates(cols=['A', 'B'],inplace=True) </code></pre> See online docs for <code>drop_duplicates</code>

Pandas' merge returns a column with _x appended to the name

Tags:

python

pandas

I have to dataframes, df1 has columns A, B, C, D... and df2 has columns A, B, E, F...

The keys I want to merge with are in column A. B is also (most likely) the same in both dataframes. Though this is a big data set I am working on cleaning so I do not have a extremely good overview of everything yet.

I do

merge(df1, df2, on='A')

And the results contains a column called B_x. Since the data set is big and messy I haven't tried to investigate how B_x differs from B in df1 and B in df2

So my question is just in general: what does Pandas mean when it has appended the _x to a column name in the merged dataframe?

Thank you

675

asked Apr 21 '14 12:04

luffe

1 Answers

The suffixes are added for any clashes in column names that are not involved in the merge operation, see online docs.

So in your case if you think that they are same you could just do the merge on both columns:

pd.merge(df1, df2, on=['A', 'B'])

What this will do though is return only the values where A and B exist in both dataframes as the default merge type is an inner merge.

So what you could do is compare this merged df size with your first one and see if they are the same and if so you could do a merge on both columns or just drop/rename the _x/_y suffix B columns.

I would spend time though determining if these values are indeed the same and exist in both dataframes, in which case you may wish to perform an outer merge:

pd.merge(df1, df2, on=['A', 'B'], how='outer')

Then what you could do is then drop duplicate rows (and possibly any NaN rows) and that should give you a clean merged dataframe.

merged_df.drop_duplicates(cols=['A', 'B'],inplace=True)

See online docs for drop_duplicates

answered Sep 20 '22 17:09

EdChum

Related questions
                            
                                Python Shell in Emacs 24?
                            
                                What is the origin of __author__?
                            
                                Issue warning for missing comma between list items bug
                            
                                Merging a list of time-range tuples that have overlapping time-ranges
                            
                                Equivalent of R/ifelse in Python/Pandas? Compare string columns?
                            
                                Does TensorFlow have cross validation implemented for its users?
                            
                                AWS lambda - Release /tmp storage after each execution
                            
                                Preventing namespace collisions between private and pypi-based Python packages
                            
                                Use %20 instead of + for space in python query parameters
                            
                                How to close Boto S3 connection?
                            
                                How is the TFIDFVectorizer in scikit-learn supposed to work?
                            
                                Mypy/typeshed stubs for Pandas
                            
                                Twisted + SQLAlchemy and the best way to do it
                            
                                How to debug python CLI that takes stdin?
                            
                                Capturing output of python script run inside a docker container
                            
                                Why can't matplotlib plot in a different thread?
                            
                                Displaying graphs/charts in Django [closed]
                            
                                General approach to developing an image classification algorithm for Dilbert cartoons
                            
                                Python ConfigParser: Checking for option existence
                            
                                Change Matplotlib's default font

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With