pandas to_excel() ignore/allow duplicate column names

Tags:

Is there way to ignore duplicate column names after using to_excel() function of pandas?

Say, I have old_wb.xlsx:

>> df1 = pd.read_excel('wb1.xlsx')
        ---------------------merged header--------------------
        col1    col2   col3   col1   col4   col1   col2   col5
        test    test   test   test   test   test   test   test

and say I did some processing to my Excel file, for example, remove the merged header and save it to another Excel file:

>> df1.to_excel('new_wb.xlsx', 'Sheet1', merged_cells=False, header=None, index=False)

The column names of new_wb.xlsx looks like this:

        col1    col2   col3   col1.1   col4   col1.2   col2.1   col5
        test    test   test   test     test   test     test     test

It added .1 to the duplicate column names and it also increments as the duplicate column name goes.

I tried renaming the column name before using to_excel() but it didn't work. It seems the renaming of duplicates happens in to_excel().

>> df1.rename(columns=lambda x: x.replace('.1',''))

Upon searching, I found an argument for to_excel() which is mangle_dupe_cols=False, unfortunately it returned:

ValueError: Setting mangle_dupe_cols=False is not supported yet

Any help on how to ignore duplicate column names in saving `to_excel()'

723

asked May 15 '18 15:05

Ricky Aguilar

2 Answers

@Ricky Aguilar has a great solution. I took his solution and just made it more dynamic.

Now you can rename all your duplicate headers, without even knowing what their values are

def dataframe_allowing_duplicate_headers():
    # To Hold All The Possible Duplicate Tags ['.1', '.2', '.3', ...]
    dup_id_range = []

    # Load Your Excel File Using Pandas
    dataframe = pandas.read_excel("path_to_excel_file", sheet_name="sheetname")

    # Generate And Store All The Possible Duplicate Tags ['.1', '.2', '.3', ...]
    for count in range(0, len(dataframe.columns)):
        dup_id_range.append( '.{}'.format(count) )

    # Search And Replace All Duplicate Headers To What It Was Set As Originally
    def rename(dataframe, character_number):
        duplicate_columns_chars = list(
            filter(lambda v: v[(len(v)-character_number):] in dup_id_range,
            dataframe.columns))

        for duplicate_column in duplicate_columns_chars:
            dataframe = dataframe.rename(
                columns={duplicate_column:duplicate_column[:-character_number]})
        return dataframe


    # Replace The Possible Duplicates Respectfully Based On Columns Count
    if len(dup_id_range) > 0:
        dataframe = rename(dataframe, 2)
        if len(dup_id_range) > 9:
            dataframe = rename(dataframe, 3)
            if len(dup_id_range) > 99:
                dataframe = rename(dataframe, 4)
                # If You Have More Than A Thousand Columns (lol)
                #if len(dup_id_range) > 999:
                #    dataframe = rename(dataframe, 5)

    return dataframe

USAGE:

# This Dataframe Will Have All Your Headers, Allowing Your Duplicates
my_dataframe = dataframe_allowing_duplicate_headers()

answered Oct 24 '22 07:10

PhillipJacobs

It's workable using:

df1.rename(columns={'old_name':'new_name'})

Although, it doesn't look good as I have 10 columns to rename.

answered Oct 24 '22 08:10

Ricky Aguilar

Related questions
                            
                                Train Tensorflow model with estimator (from_generator)
                            
                                Plot data from pandas DataFrame, colour of points dependant on a column
                            
                                TensorFlow freeze_graph: required positional argument ‘unused_args’
                            
                                Select from a DatePicker in Python
                            
                                Override unique_together error message in Django Rest Framework
                            
                                String similarity with Python + Sqlite (Levenshtein distance / edit distance)
                            
                                How to filter a Pandas dataframe by position of row? [duplicate]
                            
                                How to prevent brute force attack in Django Rest + Using Django Rest Throttling
                            
                                Transform rows to columns by the values of two rows in pandas
                            
                                Pandas: how to sum by groupby value
                            
                                Check if 2d array exists in 3d array in Python?
                            
                                Replacing all negative values in certain columns by another value in Pandas
                            
                                Delete pandas column if column name begins with a number
                            
                                Dart - Base64 string is not equal to python
                            
                                Check if string column last characters are numbers in Pandas
                            
                                spyder IDE - make variable explorer to follow the color scheme of the Editor
                            
                                Computing jacobian matrix in Tensorflow
                            
                                Python: The implementation of im2col which takes the advantages of 6 dimensional array?
                            
                                Prevent Kivy leaving debug messages
                            
                                Numpy division by 0 workaround

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

pandas to_excel() ignore/allow duplicate column names

Tags:

python

pandas

excel

Ricky Aguilar

People also ask

2 Answers

PhillipJacobs

Ricky Aguilar

Recent Activity

Donate For Us