I have an excel file with about 500,000 rows and I want to split it to several excel file, each with 50,000 rows. I want to do it with pandas so it will be the quickest and easiest. any ideas how to make it? thank you for your help

use np.split_array as per this answer https://stackoverflow.com/a/17315875/1394890 if you get array split does not result in an equal division

Pandas - split large excel file

I have an excel file with about 500,000 rows and I want to split it to several excel file, each with 50,000 rows.

I want to do it with pandas so it will be the quickest and easiest.

any ideas how to make it?

thank you for your help

How do you separate data from Excel in Python?

Excel's text to column feature lets you easily split this data into separate columns. You simply select the column, click Data → Text to Columns, and delimit by a comma. And voila! Now to do this in Pandas is just as easy!

Is pandas more powerful than Excel?

Speed - Pandas is much faster than Excel, which is especially noticeable when working with larger quantities of data. Automation - A lot of the tasks that can be achieved with Pandas are extremely easy to automate, reducing the amount of tedious and repetitive tasks that need to be performed daily.

Assuming that your Excel file has only one (first) sheet containing data, I'd make use of chunksize parameter:

import pandas as pd
import numpy as np

i=0
for df in pd.read_excel(file_name, chunksize=50000):
    df.to_excel('/path/to/file_{:02d}.xlsx'.format(i), index=False)
    i += 1

UPDATE:

chunksize = 50000
df = pd.read_excel(file_name)
for chunk in np.split(df, len(df) // chunksize):
    chunk.to_excel('/path/to/file_{:02d}.xlsx'.format(i), index=False)

use np.split_array as per this answer https://stackoverflow.com/a/17315875/1394890 if you get

array split does not result in an equal division

As explained by MaxU, I will also make use of a variable chunksize and divide the total number of rows in large file into required number of rows.

import pandas as pd
import numpy as np

chunksize = 50000
i=0
df = pd.read_excel("path/to/file.xlsx")
for chunk in np.split(df, len(df) // chunksize):
    chunk.to_excel('path/to/destination/folder/file_{:02d}.xlsx'.format(i), index=True)
    i += 1

Hope this would help you.

Pandas - split large excel file

Tags:

python

pandas

excel

TheDaJon

People also ask

3 Answers

MaxU - stop WAR against UA

wild

Tarun Balani

Recent Activity

Donate For Us

Pandas - split large excel file

Tags:

python

pandas

excel

TheDaJon

People also ask

3 Answers

MaxU - stop WAR against UA

wild

Tarun Balani

Related questions

Recent Activity

Donate For Us