Requirement:
We have two similar tables in two servers. First table in server having unique key columns A,B,C and we are inserting Table1 rows into Table2 which having unique key columns B,C,D.
Table1 is having approx 5 millions rows and Table2 will insert approx .3 millions rows due to different unique key columns constraints.
Here requirement is to fetch all the rows from Table1 and insert into Table2 if no same record is existing in Table2 and in case of record match, increase the count and update the 'cron_modified_date' column in Table2.
PHP version is 5.5 and MySQL version is 5.7 for this setup and DB server is having 6 GB RAM.
While executing below script, processing speed is getting very slow after processing 2 millions records and RAM is not freeing up and after sometime all RAM is consumed by script and after that script is not processing at all.
As you can see, I am resetting the variables and closing the DB connection as well but its not freeing up the DB server RAM. After some reading, i came to know, may be PHP garbage collection needs to call manually to free up the resources but its also not freeing up the RAM.
What I am doing wrong here and how to process millions records using the PHP, MYSQL?
Any other way to free up the RAM while executing the script and so that script should compete the execution?
/* Fetch records count for batch insert*/
$queryCount = "SELECT count(*) as totalRecords FROM TABLE1 where created_date > = '2018-02-10'";
$rowsCount = $GLOBALS['db']->execRaw( $queryCount)->fetchAll();
$recordsPerIteration = 50000 ;
$totalCount = $rowsCount[0]['totalRecords'];
$start = 0;
gc_disable() ;
if ( $totalCount > 0 ) {
while ( $totalCount > 0 ) {
$query = "SELECT * FROM TABLE1
WHERE where created_date > = '2018-02-10'
ORDER BY suggestion_id DESC
LIMIT ".$start.",".$recordsPerIteration;
print "sql is $query" ;
$getAllRows = $GLOBALS['db']->execRaw( $query )->fetchAll();
$GLOBALS['db']->queryString = null;
$GLOBALS['db']->close() ;
foreach ($getAllRows as $getRow) {
$insertRow = " INSERT INTO TABLE2 (
Name,
Company,
ProductName,
Status,
cron_modified_date)
VALUE (
".$GLOBALS['db_ab']->quote($getRow['Name']).",
".$GLOBALS['db_ab']->quote($getRow['Company']).",
".$GLOBALS['db_ab']->quote($getRow['ProductName']).",
".$getRow['Status'].",
".$GLOBALS['db_ab']->quote($getRow['created_date'])."
)
ON DUPLICATE KEY UPDATE count = (count + 1) , cron_modified_date = '".$getRow['created_date']."'" ;
$GLOBALS['db_ab']->execRaw( $insertRow ) ;
$GLOBALS['db_ab']->queryString = null;
$getRow = null;
$insertRow = null;
$GLOBALS['db_ab']->close() ;
}
gc_enable() ;
$totalCount = $totalCount- $recordsPerIteration;
$start += $recordsPerIteration ;
$getAllRows = null;
gc_collect_cycles() ;
}
}
After suggestions provided by @ABelikov and few hit & trail methods... Finally below code is working perfectly fine and freeing up the RAM after every 50K records inserts.
Below are major findings
Club the insert statements and execute the inserts in one go. Don't execute single record insertion in loop.
Thanks guys for valuable suggestions and helping out.
/* Fetch records count for batch insert*/
$queryCount = "SELECT count(*) as totalRecords FROM TABLE1 where created_date > = '2018-02-10'";
$rowsCount = $GLOBALS['db']->execRaw( $queryCount)->fetchAll();
$recordsPerIteration = 50000 ;
$totalCount = $rowsCount[0]['totalRecords'];
$start = 0;
if ( $totalCount > 0 ) {
while ( $totalCount > 0 ) {
$query = "SELECT * FROM TABLE1
WHERE where created_date > = '2018-02-10'
ORDER BY suggestion_id DESC
LIMIT ".$start.",".$recordsPerIteration;
print "sql is $query" ;
$getAllRows = $GLOBALS['db']->execRaw( $query )->fetchAll();
$GLOBALS['db']->queryString = null;
$GLOBALS['db']->close() ;
$insertRow = " INSERT INTO TABLE2 (
Name,
Company,
ProductName,
Status,
cron_modified_date)
VALUE ( " ;
foreach ($getAllRows as $getRow) {
$insertRow .= (".$GLOBALS['db_ab']->quote($getRow['Name']).",
".$GLOBALS['db_ab']->quote($getRow['Company']).",
".$GLOBALS['db_ab']->quote($getRow['ProductName']).",
".$getRow['Status'].",
".$GLOBALS['db_ab']->quote($getRow['created_date'])."),";
}
$insertRow=rtrim($insertRow,','); // Remove last ','
$insertRow.= " ON DUPLICATE KEY UPDATE count = (count + 1) , cron_modified_date = '".$getRow['created_date']."'" ;
$GLOBALS['db_ab']->execRaw( $insertRow ) ;
//Flushing all data to freeup RAM
$GLOBALS['db_ab'] = null ;
$GLOBALS['db'] = null ;
$insertRow = null;
$totalCount = $totalCount- $recordsPerIteration;
$start += $recordsPerIteration ;
$getAllRows = array();
$getAllRows = null;
print " \n Records needs to process ".$totalCount."\n";
}
}
1. Insert multiple rows solution
You can speed up you script by using "insert multiple rows" see here https://dev.mysql.com/doc/refman/5.5/en/insert.html
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);
You need keep only VALUES part in your foreach and move out all other
$insertRow = " INSERT INTO TABLE2 (
Name,
Company,
ProductName,
Status,
cron_modified_date) VALUES ";
foreach ($getAllRows as $getRow) {
$insertRow.="(".$GLOBALS['db_ab']->quote($getRow['Name']).",
".$GLOBALS['db_ab']->quote($getRow['Company']).",
".$GLOBALS['db_ab']->quote($getRow['ProductName']).",
".$getRow['Status'].",
".$GLOBALS['db_ab']->quote($getRow['created_date'])."),";
}
$insertRow=rtrim($insertRow,','); // Remove last ','
$insertRow .= " ON DUPLICATE KEY UPDATE count = (count + 1) , cron_modified_date = '".$getRow['created_date']."'" ;
$GLOBALS['db_ab']->execRaw( $insertRow ) ;
$GLOBALS['db_ab']->queryString = null;
$getRow = null;
$insertRow = null;
$GLOBALS['db_ab']->close() ;
That will help only if your foreach "body" usually runs more than one time
2. MySQL sever-side solution
Try to use TRANSACTION https://dev.mysql.com/doc/refman/5.7/en/commit.html http://php.net/manual/en/pdo.begintransaction.php
Just begin one at start of script and commit at end. Depend on you server it can realy help. But be careful! It depend on your MySQL server config sets. Need testing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With