Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read big file in php without being memory limit

I'm trying to read a file line by line. The problem is the file was too big(over 500000 line) and I reach out the memory limit. I wonder how to read the file without being memory limit.

I'm thinking about the solution multi threads(like split the file into smaller group(100000 line per group) and read it in multi threads), but I don't know how to do it in detail. Please help me(Sorry for bad English).

Here is my code

$fn = fopen("myfile.txt", "r");

while(!feof($fn)) {
    $result = fgets($fn);
    echo $result;
}

fclose($fn);
like image 966
user3391056 Avatar asked Dec 13 '22 12:12

user3391056


2 Answers

You could use a generator to handle the memory usage. This is just an example written by a user on the documentation page:

function getLines($file)
{
    $f = fopen($file, 'r');

    try {
        while ($line = fgets($f)) {
            yield $line;
        }
    } finally {
        fclose($f);
    }
}

foreach (getLines("file.txt") as $n => $line) {
    // insert the line into db or do whatever you want with it.
}

A generator allows you to write code that uses foreach to iterate over a set of data without needing to build an array in memory, which may cause you to exceed a memory limit, or require a considerable amount of processing time to generate. Instead, you can write a generator function, which is the same as a normal function, except that instead of returning once, a generator can yield as many times as it needs to in order to provide the values to be iterated over.

like image 101
Mihai Matei Avatar answered Jan 21 '23 05:01

Mihai Matei


PHP cleans memory best when a scope is cleared in my experience. A loop doesn't count as a scope, but a function does.
So handing your file pointer to a function, doing your database things within the function and then exiting the function for the loop, where you can call gc_collect_cycles() should help with managing your memory and to force php to clean up after itself.

I also recommend turning off echo, but rather log to a file. You can then use a command tail -f filename to read that log output(windows linux subsystem, git for windows bash, or on linux)

I use a similar method to below to handle large files with millions of entries, and it helps with staying under the memory limit.

function dostuff($fn) 
{
    $result = fgets($fn);
    // store database, do transforms, whatever
    echo $result;
}

$fn = fopen("myfile.txt", "r");

while(!feof($fn)) {
    dostuff($fn);
    flush(); // only need this if you do the echo thing.
    gc_collect_cycles();
}

fclose($fn);
like image 44
Tschallacka Avatar answered Jan 21 '23 07:01

Tschallacka