Save large pandas dataframe to excel

Tags:

I'm generating a large dataframe (1.5 GB when saved in CSV format) and need to store it an worksheet of an Excel file along with a second (much smaller) dataframe which is saved in a separate worksheet.

print('Reading temporaty files for variable {}:'.format(Var))
print(' Reading stations')
s=pd.read_csv(StatFile,sep=':',dtype={'ID': 'str'},encoding='utf-8')
print(' Reading data')
d=pd.read_csv(DataFile,sep=':',dtype='str',encoding='utf-8').transpose()
d.columns = d.iloc[0]
d=d[1:].astype('float')
d.reindex_axis(sorted(d.columns), axis=1)
print('Writing out Excel file for variable {}'.format(Var))
writer = pd.ExcelWriter(Path + Var + '.xlsx', engine='xlsxwriter')
d.to_excel(writer, sheet_name='Data')
OutStatCol=['ID','Name','Longitude','Latitude','GRS','OriginalVariable','VariableUnits','URL','JsonNode']
s.to_excel(writer, columns=OutStatCol, index=False, sheet_name='Stations')
writer.save()

My code works fine for smaller dataframes, but with the large ones I get the following error:

Traceback (most recent call last):
  File "./Test2.py", line 29, in <module>
    writer.save()
  File "/home/user/miniconda2/lib/python2.7/site-packages/pandas/io/excel.py", line 1413, in save
    return self.book.close()
  File "/home/user/miniconda2/lib/python2.7/site-packages/xlsxwriter/workbook.py", line 297, in close
    self._store_workbook()
  File "/home/user/miniconda2/lib/python2.7/site-packages/xlsxwriter/workbook.py", line 624, in _store_workbook
    xlsx_file.write(os_filename, xml_filename)
  File "/home/user/miniconda2/lib/python2.7/zipfile.py", line 1148, in write
    self._writecheck(zinfo)
  File "/home/user/miniconda2/lib/python2.7/zipfile.py", line 1114, in _writecheck
    " would require ZIP64 extensions")
zipfile.LargeZipFile: Filesize would require ZIP64 extensions

Is there any way I can specify something like allowZip64=True in the ExcelWriter declaration or in the to_excel() method?

Thanks!

644

asked Oct 21 '16 18:10

user6357781

1 Answers

This took some source code digging, but...

print('Reading temporaty files for variable {}:'.format(Var))
print(' Reading stations')
s=pd.read_csv(StatFile,sep=':',dtype={'ID': 'str'},encoding='utf-8')
print(' Reading data')
d=pd.read_csv(DataFile,sep=':',dtype='str',encoding='utf-8').transpose()
d.columns = d.iloc[0]
d=d[1:].astype('float')
d.reindex_axis(sorted(d.columns), axis=1)
print('Writing out Excel file for variable {}'.format(Var))
writer = pd.ExcelWriter(Path + Var + '.xlsx', engine='xlsxwriter')

#THIS
writer.book.use_zip64()

d.to_excel(writer, sheet_name='Data')
OutStatCol=['ID','Name','Longitude','Latitude','GRS','OriginalVariable','VariableUnits','URL','JsonNode']
s.to_excel(writer, columns=OutStatCol, index=False, sheet_name='Stations')
writer.save()

should work

figuring out that the writer didn't inherit from workbook took me longer than it should have. writer.book is directly a workbook instance... d'oh

answered Sep 16 '22 14:09

Aaron

Related questions
                            
                                Lambda function to make simple HTTP request
                            
                                bluetooth error no advertisable device
                            
                                Pandas - return a dataframe after groupby
                            
                                Remove first character from string Django template
                            
                                Flask jinja2 update div content without refresh page
                            
                                How to natively increment a dictionary element's value?
                            
                                Scikit-learn SVM digit recognition
                            
                                import all future features
                            
                                django: How to avoid permission error on migration
                            
                                Why is there an underscore following the "from" in the Twilio Rest API?
                            
                                How to make a matrix out of existing xyz data
                            
                                Python: Pandas Series - Difference between consecutive datetime rows in seconds
                            
                                Display Path of a file in Tkinter using "browse" Button - Python [duplicate]
                            
                                filter a pandas data frame on all rows that do NOT meet a condition [duplicate]
                            
                                Django paginator with many pages
                            
                                Adding a new row to a MultiIndex pandas DataFrame with both values and lists
                            
                                Operations on Large Python Sets
                            
                                Tkinter - Preload window?
                            
                                How to convert a list by mapping an element into multiple elements in python?
                            
                                How to read strange csv files in Pandas?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Save large pandas dataframe to excel

Tags:

python

pandas

export-to-excel

user6357781

People also ask

1 Answers

Aaron

Recent Activity

Donate For Us