I'm generating a large dataframe (1.5 GB when saved in CSV format) and need to store it an worksheet of an Excel file along with a second (much smaller) dataframe which is saved in a separate worksheet.
print('Reading temporaty files for variable {}:'.format(Var))
print(' Reading stations')
s=pd.read_csv(StatFile,sep=':',dtype={'ID': 'str'},encoding='utf-8')
print(' Reading data')
d=pd.read_csv(DataFile,sep=':',dtype='str',encoding='utf-8').transpose()
d.columns = d.iloc[0]
d=d[1:].astype('float')
d.reindex_axis(sorted(d.columns), axis=1)
print('Writing out Excel file for variable {}'.format(Var))
writer = pd.ExcelWriter(Path + Var + '.xlsx', engine='xlsxwriter')
d.to_excel(writer, sheet_name='Data')
OutStatCol=['ID','Name','Longitude','Latitude','GRS','OriginalVariable','VariableUnits','URL','JsonNode']
s.to_excel(writer, columns=OutStatCol, index=False, sheet_name='Stations')
writer.save()
My code works fine for smaller dataframes, but with the large ones I get the following error:
Traceback (most recent call last):
File "./Test2.py", line 29, in <module>
writer.save()
File "/home/user/miniconda2/lib/python2.7/site-packages/pandas/io/excel.py", line 1413, in save
return self.book.close()
File "/home/user/miniconda2/lib/python2.7/site-packages/xlsxwriter/workbook.py", line 297, in close
self._store_workbook()
File "/home/user/miniconda2/lib/python2.7/site-packages/xlsxwriter/workbook.py", line 624, in _store_workbook
xlsx_file.write(os_filename, xml_filename)
File "/home/user/miniconda2/lib/python2.7/zipfile.py", line 1148, in write
self._writecheck(zinfo)
File "/home/user/miniconda2/lib/python2.7/zipfile.py", line 1114, in _writecheck
" would require ZIP64 extensions")
zipfile.LargeZipFile: Filesize would require ZIP64 extensions
Is there any way I can specify something like allowZip64=True
in the ExcelWriter declaration or in the to_excel() method?
Thanks!
Write object to an Excel sheet. To write a single object to an Excel .xlsx file it is only necessary to specify a target file name. To write to multiple sheets it is necessary to create an ExcelWriter object with a target file name, and specify a sheet in the file to write to.
The short answer is yes, there is a size limit for pandas DataFrames, but it's so large you will likely never have to worry about it. The long answer is the size limit for pandas DataFrames is 100 gigabytes (GB) of memory instead of a set number of cells.
This took some source code digging, but...
print('Reading temporaty files for variable {}:'.format(Var))
print(' Reading stations')
s=pd.read_csv(StatFile,sep=':',dtype={'ID': 'str'},encoding='utf-8')
print(' Reading data')
d=pd.read_csv(DataFile,sep=':',dtype='str',encoding='utf-8').transpose()
d.columns = d.iloc[0]
d=d[1:].astype('float')
d.reindex_axis(sorted(d.columns), axis=1)
print('Writing out Excel file for variable {}'.format(Var))
writer = pd.ExcelWriter(Path + Var + '.xlsx', engine='xlsxwriter')
#THIS
writer.book.use_zip64()
d.to_excel(writer, sheet_name='Data')
OutStatCol=['ID','Name','Longitude','Latitude','GRS','OriginalVariable','VariableUnits','URL','JsonNode']
s.to_excel(writer, columns=OutStatCol, index=False, sheet_name='Stations')
writer.save()
should work
figuring out that the writer
didn't inherit from workbook took me longer than it should have. writer.book
is directly a workbook instance... d'oh
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With