Writing large Pandas Dataframes to CSV file in chunks

Tags:

How do I write out a large data files to a CSV file in chunks?

I have a set of large data files (1M rows x 20 cols). However, only 5 or so columns of the data files are of interest to me.

I want to make things easier by making copies of these files with only the columns of interest so I have smaller files to work with for post-processing. So I plan to read the file into a dataframe, then write to csv file.

I've been looking into reading large data files in chunks into a dataframe. However, I haven't been able to find anything on how to write out the data to a csv file in chunks.

Here is what I'm trying now, but this doesn't append the csv file:

with open(os.path.join(folder, filename), 'r') as src:     df = pd.read_csv(src, sep='\t',skiprows=(0,1,2),header=(0), chunksize=1000)     for chunk in df:         chunk.to_csv(os.path.join(folder, new_folder,                                   "new_file_" + filename),                                    columns = [['TIME','STUFF']])

984

asked Jul 22 '16 16:07

Korean_Of_the_Mountain

1 Answers

Solution:

header = True for chunk in chunks:      chunk.to_csv(os.path.join(folder, new_folder, "new_file_" + filename),         header=header, cols=[['TIME','STUFF']], mode='a')      header = False

Notes:

The mode='a' tells pandas to append.
We only write a column header on the first chunk.

140

answered Sep 17 '22 18:09

Scratch'N'Purr

Related questions
                            
                                Executing git commands inside a build job in Visual Studio Team Services (was VSO)
                            
                                How to install sagemath kernel in Jupyter
                            
                                Pandas: calculating the mean values of duplicate entries in a dataframe
                            
                                How to add new value in collection laravel?
                            
                                GCM unregister causing the application to crash
                            
                                Using Vue with django
                            
                                Can't create test file lower test start server mysql
                            
                                React Warning: flattenChildren(...): Encountered two children with the same key
                            
                                Differences in the initialization of the EAX register when calling a function in C and C++
                            
                                How to set radio button checked in Angular 2
                            
                                Spark SQL window function with complex condition
                            
                                Set style using pure JavaScript [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With