It is quite easy to add many pandas dataframes into excel work book as long as it is different worksheets. But, it is somewhat tricky to get many dataframes into one worksheet if you want to use pandas built-in df.to_excel functionality.
# Creating Excel Writer Object from Pandas writer = pd.ExcelWriter('test.xlsx',engine='xlsxwriter') workbook=writer.book worksheet=workbook.add_worksheet('Validation') df.to_excel(writer,sheet_name='Validation',startrow=0 , startcol=0) another_df.to_excel(writer,sheet_name='Validation',startrow=20, startcol=0)
The above code won't work. You will get the error of
Sheetname 'Validation', with case ignored, is already in use.
Now, I have experimented enough that I found a way to make it work.
writer = pd.ExcelWriter('test.xlsx',engine='xlsxwriter') # Creating Excel Writer Object from Pandas workbook=writer.book df.to_excel(writer,sheet_name='Validation',startrow=0 , startcol=0) another_df.to_excel(writer,sheet_name='Validation',startrow=20, startcol=0)
This will work. So, my purpose of posting this question on stackoverflow is twofold. Firstly, I hope this will help someone if he/she is trying to put many dataframes into a single work sheet at excel.
Secondly, Can someone help me understand the difference between those two blocks of code? It appears to me that they are pretty much the same except the first block of code created worksheet called "Validation" in advance while the second does not. I get that part.
What I don't understand is why should it be any different ? Even if I don't create the worksheet in advance, this line, the line right before the last one,
df.to_excel(writer,sheet_name='Validation',startrow=0 , startcol=0)
will create a worksheet anyway. Consequently, by the time we reached the last line of code the worksheet "Validation" is already created as well in the second block of code. So, my question basically, why should the second block of code work while the first doesn't?
Please also share if there is another way to put many dataframes into excel using the built-in df.to_excel functionality !!
To write to multiple sheets it is necessary to create an ExcelWriter object with a target file name, and specify a sheet in the file to write to. Multiple sheets may be written to by specifying unique sheet_name . With all data written to the file it is necessary to save the changes.
The concat() function can be used to concatenate two Dataframes by adding the rows of one to the other. The merge() function is equivalent to the SQL JOIN clause. 'left', 'right' and 'inner' joins are all possible.
To create the Worksheet in advance, you need to add the created sheet to the sheets
dict:
writer.sheets['Validation'] = worksheet
Using your original code:
# Creating Excel Writer Object from Pandas writer = pd.ExcelWriter('test.xlsx',engine='xlsxwriter') workbook=writer.book worksheet=workbook.add_worksheet('Validation') writer.sheets['Validation'] = worksheet df.to_excel(writer,sheet_name='Validation',startrow=0 , startcol=0) another_df.to_excel(writer,sheet_name='Validation',startrow=20, startcol=0)
If we look at the pandas function to_excel
, it uses the writer's write_cells
function:
excel_writer.write_cells(formatted_cells, sheet_name, startrow=startrow, startcol=startcol)
So looking at the write_cells
function for xlsxwriter
:
def write_cells(self, cells, sheet_name=None, startrow=0, startcol=0): # Write the frame cells using xlsxwriter. sheet_name = self._get_sheet_name(sheet_name) if sheet_name in self.sheets: wks = self.sheets[sheet_name] else: wks = self.book.add_worksheet(sheet_name) self.sheets[sheet_name] = wks
Here we can see that it checks for sheet_name
in self.sheets
, and so it needs to be added there as well.
user3817518: "Please also share if there is another way to put many dataframes into excel using the built-in df.to_excel functionality !!"
Here's my attempt:
Easy way to put together a lot of dataframes on just one sheet or across multiple tabs. Let me know if this works!
-- To test, just run the sample dataframes and the second and third portion of code.
import pandas as pd import numpy as np # Sample dataframes randn = np.random.randn df = pd.DataFrame(randn(15, 20)) df1 = pd.DataFrame(randn(10, 5)) df2 = pd.DataFrame(randn(5, 10))
# funtion def multiple_dfs(df_list, sheets, file_name, spaces): writer = pd.ExcelWriter(file_name,engine='xlsxwriter') row = 0 for dataframe in df_list: dataframe.to_excel(writer,sheet_name=sheets,startrow=row , startcol=0) row = row + len(dataframe.index) + spaces + 1 writer.save() # list of dataframes dfs = [df,df1,df2] # run function multiple_dfs(dfs, 'Validation', 'test1.xlsx', 1)
# function def dfs_tabs(df_list, sheet_list, file_name): writer = pd.ExcelWriter(file_name,engine='xlsxwriter') for dataframe, sheet in zip(df_list, sheet_list): dataframe.to_excel(writer, sheet_name=sheet, startrow=0 , startcol=0) writer.save() # list of dataframes and sheet names dfs = [df, df1, df2] sheets = ['df','df1','df2'] # run function dfs_tabs(dfs, sheets, 'multi-test.xlsx')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With