I have an excel file with about 500,000 rows and I want to split it to several excel file, each with 50,000 rows.
I want to do it with pandas so it will be the quickest and easiest.
any ideas how to make it?
thank you for your help
Excel's text to column feature lets you easily split this data into separate columns. You simply select the column, click Data → Text to Columns, and delimit by a comma. And voila! Now to do this in Pandas is just as easy!
Speed - Pandas is much faster than Excel, which is especially noticeable when working with larger quantities of data. Automation - A lot of the tasks that can be achieved with Pandas are extremely easy to automate, reducing the amount of tedious and repetitive tasks that need to be performed daily.
Assuming that your Excel file has only one (first) sheet containing data, I'd make use of chunksize
parameter:
import pandas as pd
import numpy as np
i=0
for df in pd.read_excel(file_name, chunksize=50000):
df.to_excel('/path/to/file_{:02d}.xlsx'.format(i), index=False)
i += 1
UPDATE:
chunksize = 50000
df = pd.read_excel(file_name)
for chunk in np.split(df, len(df) // chunksize):
chunk.to_excel('/path/to/file_{:02d}.xlsx'.format(i), index=False)
use np.split_array as per this answer https://stackoverflow.com/a/17315875/1394890 if you get
array split does not result in an equal division
As explained by MaxU, I will also make use of a variable chunksize and divide the total number of rows in large file into required number of rows.
import pandas as pd
import numpy as np
chunksize = 50000
i=0
df = pd.read_excel("path/to/file.xlsx")
for chunk in np.split(df, len(df) // chunksize):
chunk.to_excel('path/to/destination/folder/file_{:02d}.xlsx'.format(i), index=True)
i += 1
Hope this would help you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With