Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP loading large csv file - memory issues

Tags:

php

memory

I have the following code

$file="postcodes.csv";
$csv= file_get_contents($file);
$array = array_map("str_getcsv", explode("\n", $csv));
$json = json_encode($array);
print_r($json);

postcodes.csv is 603MB in size, so a large file.

In php.ini, if I have

memory_limit=1024M

I get the error

Fatal error: Allowed memory size of 1073741824 bytes exhausted (tried to allocate 256 bytes) in ...

If I increase the memory limit to 2056, I get the error

Fatal error: Out of memory (allocated 1919680512) (tried to allocate 36 bytes) in...

It is similar if I change it to -1.

So how can I load this csv file without having memory issues?

Thanks

like image 262
katie hudson Avatar asked Mar 15 '23 10:03

katie hudson


2 Answers

Instead of getting the full file into a variable, parsing it for newlines and then do str_getcsv on each array element.

Depending on what you are after, one full json containing all values from each line or multiple json strings one for each csv line.

$h = fopen("postcodes.csv",);

if ($h !== FALSE) {
    $str ='';
    while (($data = fgetcsv($handle)) !== FALSE) {

        $str .= json_encode($data); // add each json string to a string variable, save later
        // or
        $array[]=$data;     
    }
}
fclose($h);

$finalJsonString = json_encode($array);

I wouldn't recommend that you print_r an entire array or json object of that size since it would be difficult to follow.

like image 135
Alex Andrei Avatar answered Mar 17 '23 03:03

Alex Andrei


You can read your file line by line.

For example,

$file="postcodes.csv";
$array = array();
if (($handle = fopen($file, "r")) !== FALSE) {
    while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
        $array[]=$data;
    }
    fclose($handle);
}
$json = json_encode($array);
print_r($json);

But memory problem still can happen if you have really a lot of data and your array is too big

like image 43
Roman Gelembjuk Avatar answered Mar 17 '23 03:03

Roman Gelembjuk