Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP script gets progressively slower (file reader)

I have a script that, when put against a timer, gets progressively slower. It's fairly simple as all it does is reads a line, checks it then adds it to the database, then proceeds to the next line.

Here's the output of it gradually getting worse:

Record: #1,001 Memory: 1,355,360kb taking 1.84s
Record: #1,001 Memory: 1,355,360kb taking 1.84s
Record: #2,002 Memory: 1,355,192kb taking 2.12s
Record: #3,003 Memory: 1,355,192kb taking 2.39s
Record: #4,004 Memory: 1,355,192kb taking 2.65s
Record: #5,005 Memory: 1,355,200kb taking 2.94s
Record: #6,006 Memory: 1,355,376kb taking 3.28s
Record: #7,007 Memory: 1,355,176kb taking 3.56s
Record: #8,008 Memory: 1,355,408kb taking 3.81s
Record: #9,009 Memory: 1,355,464kb taking 4.07s
Record: #10,010 Memory: 1,355,392kb taking 4.32s
Record: #11,011 Memory: 1,355,352kb taking 4.63s
Record: #12,012 Memory: 1,355,376kb taking 4.90s
Record: #13,013 Memory: 1,355,200kb taking 5.14s
Record: #14,014 Memory: 1,355,184kb taking 5.43s
Record: #15,015 Memory: 1,355,344kb taking 5.72s

The file, unfortunately, is around ~20gb so I'll probably be dead by the time the whole thing is read at the rate of increase. The code is (mainly) below but I suspect it's something to do with fgets() , but I am not sure what.

    $handle = fopen ($import_file, 'r');

    while ($line = fgets ($handle))
    {
        $data = json_decode ($line);

        save_record ($data, $line);
    }

Thanks in advance!

EDIT:

Commenting out 'save_record ($data, $line);' appears to do nothing.

like image 428
DCD Avatar asked Nov 14 '22 08:11

DCD


1 Answers

Sometimes it is better to use system commands for reading these large files. I ran into something similar and here is a little trick I used:

$lines = exec("wc -l $filename");
for($i=1; $i <= $lines; $i++) {
   $line = exec('sed \''.$i.'!d\' '.$filename);

   // do what you want with the record here
}

I would not recommend this with files that cannot be trusted, but it runs fast since it pulls one record at a time using the system. Hope this helps.

like image 83
Chuck Burgess Avatar answered Dec 06 '22 20:12

Chuck Burgess