I've have a dataset that I'm ingesting into python and making several transformations, however after all the code is done I'm trying to publish the output file to an excel file, however I want to split the file into multiple sheets, each sheet containing it's header, I've tried the following codes:
1: If I use an specific column to groupby it will yield the error that the sheet is too large (reason I want to split by 1M rows)
writer = pd.ExcelWriter('output.xlsx', engine='xlsxwriter')
for key,g in icms_data.groupby('New_Loc'):
g.to_excel(writer, sheet_name=key, index=False, header=True)
writer.save()
print('done')
2: I've tried to go through every one million row but it takes way too long for the code to run and to create the excel file needed:
GROUP_LENGTH = 1000000
writer = pd.ExcelWriter('test.xlsx')
for i in range(0, len(icms_data), GROUP_LENGTH):
print(i)
icms_data.iloc[i:i+GROUP_LENGTH,].to_excel(writer, sheet_name='Row {}'.format(i), header= true)
writer.save()
print('done')
The file may have 3M or 4M or 2M depending on when it's download, is it possible to have a code that goes to the whole dataframe and split into 1M chunks and have those chunks saved into different sheets?
You can slice the dataframe in a for loop and write the slices to sheets:
GROUP_LENGTH = 1000000 # set nr of rows to slice df
with pd.ExcelWriter('output.xlsx') as writer:
for i in range(0, len(df), GROUP_LENGTH):
df[i : i+GROUP_LENGTH].to_excel(writer, sheet_name='Row {}'.format(i), index=False, header=True)
This depends on the total size of the dataframe as Excel has workbook size limits, but you can try something like;
GROUP_LENGTH = 1000000
writer = pd.ExcelWriter('test.xlsx')
number_of_chunks = math.ceil(len(df)/GROUP_LENGTH)
chunks = np.array_split(df,number_of_chunks)
sheet_number = 0
for chunk in chunks:
chunk.to_excel(writer,sheet_name=sheet_number)
sheet_number+=1
writer.save()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With