Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to build large MySQL INSERT query in PHP without wasting memory

Tags:

php

mysql

I have code looking something like this:

$data = file_get_contents($tempFile); // perhaps 30MB of file data, now in PHP's memory
$hash = md5($data);
$query = "INSERT INTO some_table
          SET BlobData = '" . mysql_real_escape_string($data) . "',
          BlobHash = '$hash'
          ";
mysql_query($query);

I know this isn't very efficient as each of the '.' operators will reallocate a bigger memory block and the 30MB string will be copied several times.

Is there anything more efficient than the following solution?

$data = file_get_contents($tempFile); // perhaps 30MB of file data, now in PHP's memory
$hash = md5($data);
$query = "INSERT INTO some_table SET BlobData = '%s', BlobHash = '$hash'";
mysql_query(sprintf($query, mysql_real_escape_string($data)));
like image 881
too much php Avatar asked Oct 17 '08 06:10

too much php


2 Answers

You have two issues here:

#1, there are several different ways you can compute the MD5 hash:

  • Do as you do and load into PHP as a string and use PHP's md5()
  • Use PHP's md5_file()
  • As of PHP 5.1+ you can use PHP's streams API with either of md5 or md5_file to avoid loading entirely into memory
  • Use exec() to call the system's md5sum command
  • Use MySQL's MD5() function to compute the hash

Since these are all trivial to implement it would be easy for you to implement and benchmark them all for memory usage and speed. Here are some benchmarks showing system md5 via exec to be a lot faster than PHP's md5_file as file size increases. Doing it your way is definitely the worst way as far as memory usage is concerned.

#2, mysql_real_escape_string performs a database query, so you're actually transmitting your blob data to the database, getting it back as a string, and transmitting it again(!) with the INSERT query. So it's traveling to/from the DB server 3x instead of 1x and using 2x the memory in PHP.

It's going to be more efficient to use PHP5 prepared statements and only send this data to the database once. Read the linked article section, you'll see it mentions that when you are binding parameters, you can use the blob type to stream blob data to the DB in chunks. The PHP docs for mysqli_stmt::send_long_data have a great simple example of this that INSERTs a file into a blob column just like you are.

By doing that, and by using either the streams API, md5_file or exec with the system md5 command, you can do your entire INSERT without ever loading the entire file into memory, which means memory usage for your series of operations can be as low as you want!

like image 79
joelhardi Avatar answered Nov 18 '22 19:11

joelhardi


If you are using PDO, and prepared statments you can use the PDO::PARAM_LOB type. See example #2 on the LOB page showing how to insert an image to a database using the file pointer.

http://us2.php.net/manual/en/pdo.lobs.php

like image 34
Zoredache Avatar answered Nov 18 '22 18:11

Zoredache