Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to extract text from a 1.3GB text file using PHP?

Tags:

file

php

I have a 1.3GB text file that I need to extract some information from in PHP. I have researched it and have come up with a few various ways to do what I need to do, but as always am after a little clarification on which method would be best or if another better exists that I do not know about?

The information I need in the text file is only the first 40 characters of each line, and there are around 17million lines in the file. The 40 characters from each line will be inserted into a database.

The methods I have are below;

// REMOVE TIME LIMIT
set_time_limit(0);
// REMOVE MEMORY LIMIT
ini_set('memory_limit', '-1');
// OPEN FILE
$handle = @fopen('C:\Users\Carl\Downloads\test.txt', 'r');
if($handle) {
    while(($buffer = fgets($handle)) !== false) {
        $insert[] = substr($buffer, 0, 40);
    }
    if(!feof($handle)) {
        // END OF FILE
    }
    fclose($handle);
}

Above is read each line at a time and get the data, I have all the database inserts sorted, doing 50 inserts at a time ten times over in a transaction.

The next method is the same as above really but calling file() to store all the lines in an array before doing a foreach to get the data? I am not sure about this method though as the array would essentially have over 17 million values.

Another method would be to extract only part of the file, rewrite the file with the unused data, and after that part has been executed recall the script using a header call?

What would be the best way in terms of getting this done in the most quick and efficient manner? Or is there a better way to approach this that I have thought of?

Also I plan to use this script with wamp, but running it in a browser while testing has caused problems with timeout even with setting script time out to 0. Is there a way I can execute the script to run without accessing the page through a browser?

like image 553
Griff Avatar asked Jun 06 '12 23:06

Griff


2 Answers

You have it good so far, don't use "file()" function as it would most probably hit RAM usage limit and terminate your script.

I wouldn't even accumulate stuff into "insert[]" array, as that would waste RAM as well. If you can, insert into the database right away.

BTW, there is a nice tool called "cut" that you could use to process the file.

cut -c1-40 file.txt

You could even redirect cut's stdout to some PHP script that inserts into database.

cut -c1-40 file.txt | php -f inserter.php

inserter.php could then read lines from php://stdin and insert into DB.

"cut" is a standard tool available on all Linuxes, if you use Windows you can get it with MinGW shell, or as part of msystools (if you use git) or install native win32 app using gnuWin32.

like image 52
Milan Babuškov Avatar answered Sep 29 '22 00:09

Milan Babuškov


Why are you doing this in PHP when your RDBMS almost certainly has bulk import functionality built in? MySQL, for example, has LOAD DATA INFILE:

LOAD DATA INFILE 'data.txt'
INTO TABLE `some_table`
  FIELDS TERMINATED BY ''
  LINES TERMINATED BY '\n';
  ( @line )
SET `some_column` = LEFT( @line, 40 );

One query.

MySQL also has the mysqlimport utility that wraps this functionality from the command line.

like image 34
Jordan Running Avatar answered Sep 28 '22 23:09

Jordan Running