I have a 1.3GB text file that I need to extract some information from in PHP. I have researched it and have come up with a few various ways to do what I need to do, but as always am after a little clarification on which method would be best or if another better exists that I do not know about?
The information I need in the text file is only the first 40 characters of each line, and there are around 17million lines in the file. The 40 characters from each line will be inserted into a database.
The methods I have are below;
// REMOVE TIME LIMIT
set_time_limit(0);
// REMOVE MEMORY LIMIT
ini_set('memory_limit', '-1');
// OPEN FILE
$handle = @fopen('C:\Users\Carl\Downloads\test.txt', 'r');
if($handle) {
while(($buffer = fgets($handle)) !== false) {
$insert[] = substr($buffer, 0, 40);
}
if(!feof($handle)) {
// END OF FILE
}
fclose($handle);
}
Above is read each line at a time and get the data, I have all the database inserts sorted, doing 50 inserts at a time ten times over in a transaction.
The next method is the same as above really but calling file()
to store all the lines in an array before doing a foreach
to get the data? I am not sure about this method though as the array would essentially have over 17 million values.
Another method would be to extract only part of the file, rewrite the file with the unused data, and after that part has been executed recall the script using a header
call?
What would be the best way in terms of getting this done in the most quick and efficient manner? Or is there a better way to approach this that I have thought of?
Also I plan to use this script with wamp, but running it in a browser while testing has caused problems with timeout even with setting script time out to 0. Is there a way I can execute the script to run without accessing the page through a browser?
You have it good so far, don't use "file()" function as it would most probably hit RAM usage limit and terminate your script.
I wouldn't even accumulate stuff into "insert[]" array, as that would waste RAM as well. If you can, insert into the database right away.
BTW, there is a nice tool called "cut" that you could use to process the file.
cut -c1-40 file.txt
You could even redirect cut's stdout to some PHP script that inserts into database.
cut -c1-40 file.txt | php -f inserter.php
inserter.php could then read lines from php://stdin and insert into DB.
"cut" is a standard tool available on all Linuxes, if you use Windows you can get it with MinGW shell, or as part of msystools (if you use git) or install native win32 app using gnuWin32.
Why are you doing this in PHP when your RDBMS almost certainly has bulk import functionality built in? MySQL, for example, has LOAD DATA INFILE
:
LOAD DATA INFILE 'data.txt'
INTO TABLE `some_table`
FIELDS TERMINATED BY ''
LINES TERMINATED BY '\n';
( @line )
SET `some_column` = LEFT( @line, 40 );
One query.
MySQL also has the mysqlimport
utility that wraps this functionality from the command line.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With