I am trying to write a pandas <code>DataFrame</code> to an <code>.xlsx</code> file where different numerical columns would have different formats. For example, some would show only two decimal places, some would show none, some would be formatted as percents with a "%" symbol, etc. I noticed that <code>DataFrame.to_html()</code> has a <code>formatters</code> parameter that allows one to do just that, mapping different formats to different columns. However, there is no similar parameter on the <code>DataFrame.to_excel()</code> method. The most we have is a <code>float_format</code> that is global to all numbers. I have read many SO posts that are at least partly related to my question, for example: <ul> <li> Use the older <code>openpyxl</code> engine to apply formats one cell at a time. This is the approach with which I've had the most success. But it means writing loops to apply formats cell-by-cell, remembering offsets, etc. </li> <li> Render percentages by changing the table data itself into strings. Going the route of altering the actual data inspired me to try dealing with decimal place formatting by calling <code>round()</code> on each column before writing to Excel - this works too, but I'd like to avoid altering the data.</li> <li>Assorted others, mostly about date formats</li> </ul> Are there other more convenient Excel-related functions/properties in the pandas API that can help here, or something similar on <code>openpyxl</code>, or perhaps some way to specify output format metadata directly onto each column in the <code>DataFrame</code> that would then be interpreted downstream by different outputters?

As you rightly point out applying formats to individual cells is extremely inefficient. openpyxl 2.4 includes native support for Pandas Dataframes and named styles. https://openpyxl.readthedocs.io/en/latest/changes.html#id7

Writing pandas DataFrame to Excel with different formats for different columns

Tags:

python

pandas

excel

openpyxl

I am trying to write a pandas DataFrame to an .xlsx file where different numerical columns would have different formats. For example, some would show only two decimal places, some would show none, some would be formatted as percents with a "%" symbol, etc.

I noticed that DataFrame.to_html() has a formatters parameter that allows one to do just that, mapping different formats to different columns. However, there is no similar parameter on the DataFrame.to_excel() method. The most we have is a float_format that is global to all numbers.

I have read many SO posts that are at least partly related to my question, for example:

Use the older openpyxl engine to apply formats one cell at a time. This is the approach with which I've had the most success. But it means writing loops to apply formats cell-by-cell, remembering offsets, etc.
Render percentages by changing the table data itself into strings. Going the route of altering the actual data inspired me to try dealing with decimal place formatting by calling round() on each column before writing to Excel - this works too, but I'd like to avoid altering the data.
Assorted others, mostly about date formats

Are there other more convenient Excel-related functions/properties in the pandas API that can help here, or something similar on openpyxl, or perhaps some way to specify output format metadata directly onto each column in the DataFrame that would then be interpreted downstream by different outputters?

573

asked Apr 30 '15 18:04

sparc_spread

2 Answers

You can do this with Pandas 0.16 and the XlsxWriter engine by accessing the underlying workbook and worksheet objects:

import pandas as pd

# Create a Pandas dataframe from some data.
df = pd.DataFrame(zip(
    [1010, 2020, 3030, 2020, 1515, 3030, 4545],
    [.1, .2, .33, .25, .5, .75, .45],
    [.1, .2, .33, .25, .5, .75, .45],
))

# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1')

# Get the xlsxwriter objects from the dataframe writer object.
workbook  = writer.book
worksheet = writer.sheets['Sheet1']

# Add some cell formats.
format1 = workbook.add_format({'num_format': '#,##0.00'})
format2 = workbook.add_format({'num_format': '0%'})
format3 = workbook.add_format({'num_format': 'h:mm:ss AM/PM'})

# Set the column width and format.
worksheet.set_column('B:B', 18, format1)

# Set the format but not the column width.
worksheet.set_column('C:C', None, format2)

worksheet.set_column('D:D', 16, format3)

# Close the Pandas Excel writer and output the Excel file.
writer.save()

Output:

enter image description here

See also Working with Python Pandas and XlsxWriter.

answered Sep 28 '22 07:09

jmcnamara

As you rightly point out applying formats to individual cells is extremely inefficient.

openpyxl 2.4 includes native support for Pandas Dataframes and named styles.

https://openpyxl.readthedocs.io/en/latest/changes.html#id7

answered Sep 28 '22 07:09

Charlie Clark

Related questions
                            
                                matlab isempty() function in numpy?
                            
                                Python PIL/Pillow - Pad image to desired size (eg. A4)
                            
                                How to read a gzip netcdf file in python?
                            
                                How can I print the type of a PyObject in an error message for an embedded Python script?
                            
                                How do I deploy a Python application to Amazon Elastic Beanstalk from Jenkins?
                            
                                Python - dictionary of lists
                            
                                What to choose to begin with ComputerVision: Scikit-image or OpenCV? [closed]
                            
                                How to submit a form in scrapy?
                            
                                Path in Variable with r'
                            
                                How do I set the matplotlib window size for the MacOSX backend?
                            
                                Convert unique numbers to md5 hash using pandas
                            
                                Filling empty python dataframe using loops
                            
                                QFileDialog to open multiple files
                            
                                How to use sys.argv in python to check length of arguments so it can run as script?
                            
                                How to know if my code is running through Cython or standard Python interpreter?
                            
                                Move files from one directory to another with Paramiko
                            
                                Pandas Write table to MySQL: "unable to rollback"
                            
                                python math domain error - sqrt
                            
                                Get the immediate minimum among a list of numbers in python
                            
                                How to limit function parameter as array of fixed-size?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With