Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Process very big csv file without timeout and memory error

At the moment I'm writing an import script for a very big CSV file. The Problem is most times it stops after a while because of an timeout or it throws an memory error.

My Idea was now to parse the CSV file in "100 lines" steps and after 100 lines recall the script automatically. I tried to achieve this with header (location ...) and pass the current line with get but it didn't work out as I want to.

Is there a better way to this or does someone have an idea how to get rid of the memory error and the timeout?

like image 942
Julian Avatar asked Sep 06 '11 10:09

Julian


People also ask

How do I process a large CSV file?

So, how do you open large CSV files in Excel? Essentially, there are two options: Split the CSV file into multiple smaller files that do fit within the 1,048,576 row limit; or, Find an Excel add-in that supports CSV files with a higher number of rows.

How big is too big for a CSV file?

The Difficulty with Opening Big CSVs in Excel Excel is limited to opening CSVs that fit within your computer's RAM. For most modern computers, that means a limit of about 60,000 to 200,000 rows.

What is the maximum number of patients that can be added via CSV at a time?

(Google Workspace Business edition customers) You can add a maximum of 300 users. There is no minimum or maximum user limit for Enterprise plans.


1 Answers

I've used fgetcsv to read a 120MB csv in a stream-wise-manner (is that correct english?). That reads in line by line and then I've inserted every line into a database. That way only one line is hold in memory on each iteration. The script still needed 20 min. to run. Maybe I try Python next time… Don't try to load a huge csv-file into an array, that really would consume a lot of memory.

// WDI_GDF_Data.csv (120.4MB) are the World Bank collection of development indicators: // http://data.worldbank.org/data-catalog/world-development-indicators if(($handle = fopen('WDI_GDF_Data.csv', 'r')) !== false) {     // get the first row, which contains the column-titles (if necessary)     $header = fgetcsv($handle);      // loop through the file line-by-line     while(($data = fgetcsv($handle)) !== false)     {         // resort/rewrite data and insert into DB here         // try to use conditions sparingly here, as those will cause slow-performance          // I don't know if this is really necessary, but it couldn't harm;         // see also: http://php.net/manual/en/features.gc.php         unset($data);     }     fclose($handle); } 
like image 69
feeela Avatar answered Oct 08 '22 19:10

feeela