Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting a Dataframe into multiple sheets

Tags:

python

pandas

I've have a dataset that I'm ingesting into python and making several transformations, however after all the code is done I'm trying to publish the output file to an excel file, however I want to split the file into multiple sheets, each sheet containing it's header, I've tried the following codes:

1: If I use an specific column to groupby it will yield the error that the sheet is too large (reason I want to split by 1M rows)

writer = pd.ExcelWriter('output.xlsx', engine='xlsxwriter')
        
for key,g in icms_data.groupby('New_Loc'):    
    g.to_excel(writer, sheet_name=key, index=False, header=True)

writer.save()
print('done')

2: I've tried to go through every one million row but it takes way too long for the code to run and to create the excel file needed:

GROUP_LENGTH = 1000000

writer = pd.ExcelWriter('test.xlsx')

for i in range(0, len(icms_data), GROUP_LENGTH):
    print(i)
    icms_data.iloc[i:i+GROUP_LENGTH,].to_excel(writer, sheet_name='Row {}'.format(i), header= true)

writer.save()
print('done') 

The file may have 3M or 4M or 2M depending on when it's download, is it possible to have a code that goes to the whole dataframe and split into 1M chunks and have those chunks saved into different sheets?

like image 337
silentninja89 Avatar asked Sep 14 '25 13:09

silentninja89


2 Answers

You can slice the dataframe in a for loop and write the slices to sheets:

GROUP_LENGTH = 1000000 # set nr of rows to slice df

with pd.ExcelWriter('output.xlsx') as writer:
  for i in range(0, len(df), GROUP_LENGTH):
      df[i : i+GROUP_LENGTH].to_excel(writer, sheet_name='Row {}'.format(i), index=False, header=True)
like image 193
RJ Adriaansen Avatar answered Sep 16 '25 02:09

RJ Adriaansen


This depends on the total size of the dataframe as Excel has workbook size limits, but you can try something like;

GROUP_LENGTH = 1000000
writer = pd.ExcelWriter('test.xlsx')
number_of_chunks = math.ceil(len(df)/GROUP_LENGTH)
chunks = np.array_split(df,number_of_chunks)
sheet_number = 0
for chunk in chunks:
    chunk.to_excel(writer,sheet_name=sheet_number)
    sheet_number+=1

writer.save()
like image 23
Boskosnitch Avatar answered Sep 16 '25 03:09

Boskosnitch