How can I split a large file csv file (7GB) in Python

Tags:

I have a 7GB csv file which I'd like to split into smaller chunks, so it is readable and faster for analysis in Python on a notebook. I would like to grab a small set from it, maybe 250MB, so how can I do this?

823

asked Nov 17 '13 17:11

Sohail

3 Answers

You don't need Python to split a csv file. Using your shell:

$ split -l 100 data.csv

Would split data.csv in chunks of 100 lines.

answered Oct 10 '22 19:10

Thomas Orozco

I had to do a similar task, and used the pandas package:

for i,chunk in enumerate(pd.read_csv('bigfile.csv', chunksize=500000)):
    chunk.to_csv('chunk{}.csv'.format(i), index=False)

answered Oct 10 '22 20:10

Quentin Febvre

Here is a little python script I used to split a file data.csv into several CSV part files. The number of part files can be controlled with chunk_size (number of lines per part file).

The header line (column names) of the original file is copied into every part CSV file.

It works for big files because it reads one line at a time with readline() instead of loading the complete file into memory at once.

#!/usr/bin/env python3

def main():
    chunk_size = 9998  # lines

    def write_chunk(part, lines):
        with open('data_part_'+ str(part) +'.csv', 'w') as f_out:
            f_out.write(header)
            f_out.writelines(lines)

    with open('data.csv', 'r') as f:
        count = 0
        header = f.readline()
        lines = []
        for line in f:
            count += 1
            lines.append(line)
            if count % chunk_size == 0:
                write_chunk(count // chunk_size, lines)
                lines = []
        # write remainder
        if len(lines) > 0:
            write_chunk((count // chunk_size) + 1, lines)

if __name__ == '__main__':
    main()

answered Oct 10 '22 20:10

Roberto

Related questions
                            
                                syntastic complaining about ES6 module syntax
                            
                                placeholder text for chosen plugin for single select not working
                            
                                How to use Application.Exit Event in WPF?
                            
                                Estimating beacon proximity/distance based on RSSI - Bluetooth LE
                            
                                RecyclerView onItemClick effect in L
                            
                                Navigation Drawer lag on Android
                            
                                IOS devices issues with HTML form input (type = text)
                            
                                FOSUserBundle - BadCredentialsException
                            
                                How do I display large numbers with commas? HTML
                            
                                Spring Boot GS: ComponentScan and ClassNotFoundException for ConnectionFactory
                            
                                GIT Error:- expected committer email '' but found '[email protected]'
                            
                                Response model for specific status codes using Swagger

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I split a large file csv file (7GB) in Python

Tags:

python

split

csv