PHP Using fgetcsv on a huge csv file

Tags:

Using fgetcsv, can I somehow do a destructive read where rows I've read and processed would be discarded so if I don't make it through the whole file in the first pass, I can come back and pick up where I left off before the script timed out?

Additional Details:

I'm getting a daily product feed from a vendor that comes across as a 200mb .gz file. When I unpack the file, it turns into a 1.5gb .csv with nearly 500,000 rows and 20 - 25 fields. I need to read this information into a MySQL db, ideally with PHP so I can schedule a CRON to run the script at my web hosting provider every day.

I have a hard timeout on the server set to 180 seconds by the hosting provider, and max memory utilization limit of 128mb for any single script. These limits cannot be changed by me.

My idea was to grab the information from the .csv using the fgetcsv function, but I'm expecting to have to take multiple passes at the file because of the 3 minute timeout, I was thinking it would be nice to whittle away at the file as I process it so I wouldn't need to spend cycles skipping over rows that were already processed in a previous pass.

487

asked Oct 22 '13 10:10

Robert82

3 Answers

From your problem description it really sounds like you need to switch hosts. Processing a 2 GB file with a hard time limit is not a very constructive environment. Having said that, deleting read lines from the file is even less constructive, since you would have to rewrite the entire 2 GB to disk minus the part you have already read, which is incredibly expensive.

Assuming you save how many rows you have already processed, you can skip rows like this:

$alreadyProcessed = 42; // for example

$i = 0;
while ($row = fgetcsv($fileHandle)) {
    if ($i++ < $alreadyProcessed) {
        continue;
    }

    ...
}

However, this means you're reading the entire 2 GB file from the beginning each time you go through it, which in itself already takes a while and you'll be able to process fewer and fewer rows each time you start again.

The best solution here is to remember the current position of the file pointer, for which ftell is the function you're looking for:

$lastPosition = file_get_contents('last_position.txt');
$fh = fopen('my.csv', 'r');
fseek($fh, $lastPosition);

while ($row = fgetcsv($fh)) {
    ...

    file_put_contents('last_position.txt', ftell($fh));
}

This allows you to jump right back to the last position you were at and continue reading. You obviously want to add a lot of error handling here, so you're never in an inconsistent state no matter which point your script is interrupted at.

159

answered Oct 07 '22 09:10

deceze

You can avoid timeout and memory error to some extent when reading like a Stream. By Reading line by line and then inserts each line into a database (Or Process accordingly). In that way only single line is hold in memory on each iteration. Please note don't try to load a huge csv-file into an array, that really would consume a lot of memory.

if(($handle = fopen("yourHugeCSV.csv", 'r')) !== false)
{
    // Get the first row (Header)
    $header = fgetcsv($handle);

    // loop through the file line-by-line
    while(($data = fgetcsv($handle)) !== false)
    {
        // Process Your Data
        unset($data);
    }
    fclose($handle);
}

answered Oct 07 '22 08:10

Jenson M John

I think a better solution (it will be phenomnally inefficient to continuously rewind and write to open file stream) would be to track the file position of each record read (using ftell) and store it with the data you've read - then if you have to resume, then just fseek to the last position.

You could try loading the file directly using mysql's read file function (which will likely be a lot faster) although I've had problems with this in the past and ended up writing my own php code.

I have a hard timeout on the server set to 180 seconds by the hosting provider, and max memory utilization limit of 128mb for any single script. These limits cannot be changed by me.

What have you tried?

The memory can be limited by other means than the php.ini file, but I can't imagine how anyone could actually prevent you from using a different execution time (even if ini_set is disabled, from the command line you could run php -d max_execution_time=3000 /your/script.php or php -c /path/to/custom/inifile /your/script.php )

Unless you are trying to fit the entire datafile into memory then there should be no issue with a memory limit of 128Mb

answered Oct 07 '22 09:10

symcbean

Related questions
                            
                                Multiple Magento Environments
                            
                                How to get Age using BirthDate column by MySQL query?
                            
                                Returning a JSON result from a PHP REST web service
                            
                                PHP, an odd variable scope?
                            
                                Curl, follow location but only get header of the new location?
                            
                                How to check if a variable contains a date or a date and text?
                            
                                How to slice a string in PHP?
                            
                                Efficient way to replace placeholders with variables [duplicate]
                            
                                Call to undefined function on Codeigniter
                            
                                Moving up/down an item in the array by its value
                            
                                How to go to new Html page from PHP script?
                            
                                Add one week to Date()
                            
                                Symfony error 500 with app.php, works fine on app_dev.php
                            
                                strip spaces from string using smarty
                            
                                Call private method from inherited class
                            
                                Custom helper functions in OpenCart
                            
                                Changed upload_max_filesize in php.ini but phpinfo doesn't show the change
                            
                                Wordpress - Woocommerece remove "Added to Cart" message
                            
                                Allow a page to only load in an iframe
                            
                                Call one function from another function in PHP class

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With