Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP fastest way to register millions of records in MYSQL

I have to register millions of page views in my DB, I'm looking for the best solution to decrease the server load.

1. Actual solution: check if is unique and register in "raw" table and "optimized" table

// script
$checkUnique = mysqli_query( $con, "SELECT FROM rawTable
         WHERE datatime = '$today' AND ip = '$ip'
         ORDER BY datetime DESC LIMIT 1" );
mysqli_query( $con, "INSERT INTO rawTable ( id, datetime, url, ip, ua )
         VALUES ( NULL, '$now', '$url', '$ip', '$ua' )" );
if( mysqli_num_rows( $checkUnique ) == 0 ) {
    mysqli_query( $con, "INSERT INTO optimizedTable ( id, day, total )
                         VALUES ( NULL, '$today', 1 )" ); }
else{
    mysqli_query( $con, "UPDATE optimizedTable SET total = total + 1
            WHERE day = '$today' ORDER BY day DESC LIMIT 1"; }

2. Register the views only in "raw" table, and then with cronjob populate the "optimized" table

// script
mysqli_query( $con, "INSERT INTO rawTable ( id, datetime, url, ip, ua, alreadyOptimized )
         VALUES ( NULL, '$now', '$url', '$ip', '$ua', 0 )" );

// cronjob -> check if is unique, populate mysql tables +
//         change column alreadyOptimized from 0 to 1 in raw table

3. Register the raw views in a txt or csv file, and then populate the mysql tables with cronjob

// script
$file = fopen("file.txt", "w");
fwrite($file, "$now,$url,$ip,$ua\n");

// cronjob -> check if is unique, populate mysql tables + delete rows from txt/csv file

What is the best (lightest and fastest) way? Are there any better solutions?

PS: The server load is caused by the select query to check if the views are unique

like image 633
ipel Avatar asked Feb 12 '16 09:02

ipel


People also ask

Can MySQL handle millions of records?

Millions of rows is fine, tens of millions of rows is fine - provided you've got an even remotely decent server, i.e. a few Gbs of RAM, plenty disk space. You will need to learn about indexes for fast retrieval, but in terms of MySQL being able to handle it, no problem. Save this answer. Show activity on this post.


1 Answers

Manually selecting to check if record exists is the worst thing you can do - it can (and will) produce false results. There's a time lag between MySQL and any process connecting to it. The only proper way is to place UNIQUE constraint and simply just INSERT. That's the only way to be 100% certain your DB won't contain duplicates.

The reason this is interesting for your use case is that it cuts your code down by 50%. You don't have to SELECT first, therefore you get rid of a huge bottleneck.

Use INSERT IGNORE or INSERT INTO .. ON DUPLICATE KEY UPDATE if you need to update the existing record.

Your unique constraint would be compound index on datetime, ip columns. To even further optimize this, you can create a binary(20) column in your table and have it contain a sha1 hash of datetime, ip combination. Using triggers, you can create the hash before inserting, making the whole process invisible to actual person inserting into table.

If insert fails, record exist. If insert succeeds, you've done what you wanted to. No SELECT being used should yield performance. After that, if it's still slow - it's simply the limit of I/O of the server you use and you need to look for optimizations on hardware level.

like image 139
Mjh Avatar answered Nov 12 '22 13:11

Mjh