Pandas merge unexpectedly produces suffixes

Tags:

I am merging two Pandas DataFrames together and am getting "_x" and "_y" suffixes. Easy to replicate example below. I tried adding , suffixes=(False, False) into the merge, but it returns an error: ValueError: columns overlap but no suffix specified: Index(['f1', 'f2', 'f3'], dtype='object'). I must be missing something obvious here? I understand why this would occur using join, but I didn't expect it for merge.

Please ignore the copy slice error. I can't figure out why it doesn't throw this error on Line 10, but does throw it on Line 17. (If you know, there's an open question here on it!)

System details: Windows 10
conda 4.8.2
Python 3.8.3
pandas 1.0.5 py38he6e81aa_0 conda-forge

import pandas as pd

#### Build an example DataFrame for easy-to-replicate example ####
myid = [1, 1, 1, 2, 2]
myorder = [3, 2, 1, 2, 1]
y = [3642, 3640, 3632, 3628, 3608]
x = [11811, 11812, 11807, 11795, 11795]
df = pd.DataFrame(list(zip(myid, myorder, x, y)), 
                  columns =['myid', 'myorder', 'x', 'y']) 
df.sort_values(by=['myid', 'myorder'], inplace=True) #Line10
df.reset_index(drop=True, inplace=True)
display(df.style.hide_index())

### Typical analysis on existing DataFrame, Error occurs in here ####
for id in df.myid.unique():
    tempdf = df[mygdf.myid == id]
    tempdf.sort_values(by=['myid', 'myorder'], inplace=True) #Line17
    tempdf.reset_index(drop=True, inplace=True)
    for i, r in tempdf.iloc[1:].iterrows():
        ## in reality, calling a more complicated function here
        ## this is just a simple example
        tempdf.loc[i, 'f1'] = tempdf.x[i-1] - tempdf.x[i]
        tempdf.loc[i, 'f2'] = tempdf.y[i-1] - tempdf.y[i]
        tempdf.loc[i, 'f3'] = tempdf.y[i] +2
   
    what_i_care_about = ['myid', 'myorder', 'f1', 'f2', 'f3']

    df = pd.merge(df, tempdf[what_i_care_about], 
                  on=['myid', 'myorder'], how='outer')
    del tempdf

display(df.style.hide_index())

enter image description here

477

asked Jul 07 '20 15:07

a11

1 Answers

Your problem is that there are columns you are not merging on that are common to both source DataFrames. Pandas needs a way to say which one came from where, so it adds the suffixes, the defaults being '_x' on the left and '_y' on the right.

If you have a preference on which source data frame to keep the columns from, then you can set the suffixes and filter accordingly, for example if you want to keep the clashing columns from the left:

# Label the two sides, with no suffix on the side you want to keep
df = pd.merge(
    df, 
    tempdf[what_i_care_about], 
    on=['myid', 'myorder'], 
    how='outer',
    suffixes=('', '_delme')  # Left gets no suffix, right gets something identifiable
)
# Discard the columns that acquired a suffix
df = df[[c for c in df.columns if not c.endswith('_delme')]]

Alternatively, you can drop one of each of the clashing columns prior to merging, then Pandas has no need to assign a suffix.

153

answered Oct 21 '22 07:10

Chris Cooper

Related questions
                            
                                How to pass a list/tuple into an environment variable for Django
                            
                                How can I write a function fmap that returns the same type of iterable that was inputted?
                            
                                How to copy only the changed file-contents on the already existed destination file?
                            
                                Is it possible to expand the drawable area around the QSlider
                            
                                Docker "unsupported locale setting" when running Python container
                            
                                Python 3 - Get definition path of object
                            
                                matplotlib geopandas plot chloropleth with set bins for colorscheme
                            
                                How to efficiently and quickly find valid combinations out of an array of string elements for employee scheduling?
                            
                                How to use OpenCV4's FastLineDetector in Python 3?
                            
                                ImportError: No module named 'keras.layers.merge'
                            
                                Django Pass Request Data to Forms.py
                            
                                how to change the dimensions of a histogram depicted by plt.hist() as figsize is not an argument [duplicate]
                            
                                How to use pyinstaller with pipenv / pyenv
                            
                                How to show Folium map inside a PyQt5 GUI?
                            
                                Why is Datetime's `.timestamp()` method returning `OSError: [Errno 22] Invalid argument`?
                            
                                Getting the indexes of each element in a list of lists and making a dictionary
                            
                                How to avoid excessive lambda functions in pandas DataFrame assign and apply method chains
                            
                                homebrew error The following directories are not writable by your user:
                            
                                Testing argument using Python Click
                            
                                I can't seem to make the google.cloud.texttospeech to work

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas merge unexpectedly produces suffixes

Tags:

merge

python-3.x

pandas

a11

People also ask

1 Answers

Chris Cooper

Recent Activity

Donate For Us