Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Writing pandas DataFrame to Excel with different formats for different columns

I am trying to write a pandas DataFrame to an .xlsx file where different numerical columns would have different formats. For example, some would show only two decimal places, some would show none, some would be formatted as percents with a "%" symbol, etc.

I noticed that DataFrame.to_html() has a formatters parameter that allows one to do just that, mapping different formats to different columns. However, there is no similar parameter on the DataFrame.to_excel() method. The most we have is a float_format that is global to all numbers.

I have read many SO posts that are at least partly related to my question, for example:

  • Use the older openpyxl engine to apply formats one cell at a time. This is the approach with which I've had the most success. But it means writing loops to apply formats cell-by-cell, remembering offsets, etc.
  • Render percentages by changing the table data itself into strings. Going the route of altering the actual data inspired me to try dealing with decimal place formatting by calling round() on each column before writing to Excel - this works too, but I'd like to avoid altering the data.
  • Assorted others, mostly about date formats

Are there other more convenient Excel-related functions/properties in the pandas API that can help here, or something similar on openpyxl, or perhaps some way to specify output format metadata directly onto each column in the DataFrame that would then be interpreted downstream by different outputters?

like image 573
sparc_spread Avatar asked Apr 30 '15 18:04

sparc_spread


People also ask

Can a pandas column have different data types?

A column in a DataFrame can only have one data type. The data type in a DataFrame's single column can be checked using dtype .

How do I write pandas Dataframes to multiple sheets in excel?

To write to multiple sheets it is necessary to create an ExcelWriter object with a target file name, and specify a sheet in the file to write to. Multiple sheets may be written to by specifying unique sheet_name . With all data written to the file it is necessary to save the changes.

Can pandas DataFrame store different data types?

A DataFrame is a 2-dimensional data structure that can store data of different types (including characters, integers, floating point values, categorical data and more) in columns.

How do I change the data types for all columns in pandas?

to_numeric() The best way to convert one or more columns of a DataFrame to numeric values is to use pandas. to_numeric() . This function will try to change non-numeric objects (such as strings) into integers or floating-point numbers as appropriate.


2 Answers

You can do this with Pandas 0.16 and the XlsxWriter engine by accessing the underlying workbook and worksheet objects:

import pandas as pd

# Create a Pandas dataframe from some data.
df = pd.DataFrame(zip(
    [1010, 2020, 3030, 2020, 1515, 3030, 4545],
    [.1, .2, .33, .25, .5, .75, .45],
    [.1, .2, .33, .25, .5, .75, .45],
))

# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1')

# Get the xlsxwriter objects from the dataframe writer object.
workbook  = writer.book
worksheet = writer.sheets['Sheet1']

# Add some cell formats.
format1 = workbook.add_format({'num_format': '#,##0.00'})
format2 = workbook.add_format({'num_format': '0%'})
format3 = workbook.add_format({'num_format': 'h:mm:ss AM/PM'})

# Set the column width and format.
worksheet.set_column('B:B', 18, format1)

# Set the format but not the column width.
worksheet.set_column('C:C', None, format2)

worksheet.set_column('D:D', 16, format3)

# Close the Pandas Excel writer and output the Excel file.
writer.save()

Output:

enter image description here

See also Working with Python Pandas and XlsxWriter.

like image 91
jmcnamara Avatar answered Sep 28 '22 07:09

jmcnamara


As you rightly point out applying formats to individual cells is extremely inefficient.

openpyxl 2.4 includes native support for Pandas Dataframes and named styles.

https://openpyxl.readthedocs.io/en/latest/changes.html#id7

like image 28
Charlie Clark Avatar answered Sep 28 '22 07:09

Charlie Clark