I have an app that needs to update a large amount of data over a large number of entries. Basically it does some 7,000 inserts and/or updates but it takes a looooong time (like almost 9 minutes... averaging about 0.08 seconds per query). Essentially I'm looking for general speed ups to making multiple such requests (I'm not expecting a specific answer to my vague example... that's just to, hopefully, help explain).
Here are some samples from profiling the requests:
SELECT `habitable_planets`.* FROM `habitable_planets` WHERE (timestamp = '2010-10-15T07:30:00-07:00') AND (planet_id = '2010_Gl_581_c')
INSERT INTO `habitable_planets` (`planet_id`, `timestamp`, `weather_air_temp`, `weather_cell_temp`, `weather_irradiance`, `weather_wind_float`, `biolumin_to_date`, `biolumin_detected`, `craft_energy_usage`, `craft_energy_consumed_to_date`) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
SELECT `habitable_planets`.* FROM `habitable_planets` WHERE (timestamp = '2010-10-15T07:45:00-07:00') AND (planet_id = '2010_Gl_581_c')
INSERT INTO `habitable_planets` (`planet_id`, `timestamp`, `weather_air_temp`, `weather_cell_temp`, `weather_irradiance`, `weather_wind_float`, `biolumin_to_date`, `biolumin_detected`, `craft_energy_usage`, `craft_energy_consumed_to_date`) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
Repeat ad nauseam (well, about 7,000 times). This is an update that collects data generated at intervals over a 24 hour period and then makes a massive update to the database once per day. Given the limited bit I've shown you are there any suggestions for speeding this process up?
For example... would it make sense to, rather than do a select for each timestamp, do one select for a range all at once and then iterate over them in the script?
Vaguely like:
SELECT `habitable_planets`.* FROM `habitable_planets` WHERE (planet_id = '2010_Gl_581_c')
assign that result to $foo
and then do:
foreach ($foo as $bar)
{
if ($bar['timestamp'] == $baz) // where $baz is the needed timestamp
{
// do the insert here
}
}
EDIT: To add a bit to this, one thing that improved responsiveness in my situation was to change a bunch of code that checked for a record existing and either did an insert or an update depending on the result into using an INSERT... ON DUPLICATE KEY UPDATE
sql query. This resulted in about a 30% speed gain in my particular case because it cut at least one trip to the database out of the equation and over thousands of requests this really adds up.
Some useful links:
From MySQL Documentation:
Speed of INSERT Statements says:
If you are inserting many rows from the same client at the same time, use INSERT statements with multiple VALUES lists to insert several rows at a time. This is considerably faster (many times faster in some cases) than using separate single-row INSERT statements. If you are adding data to a nonempty table, you can tune the bulk_insert_buffer_size variable to make data insertion even faster.
If multiple clients are inserting a lot of rows, you can get higher speed by using the INSERT DELAYED statement.
For a MyISAM table, you can use concurrent inserts to add rows at the same time that SELECT statements are running, if there are no deleted rows in middle of the data file.
When loading a table from a text file, use LOAD DATA INFILE. This is usually 20 times faster than using INSERT statements.
With some extra work, it is possible to make LOAD DATA INFILE run even faster for a MyISAM table when the table has many indexes.
If you're not already, use prepared statements (via either mysqli
, PDO
, or some other DB library that supports them). Reusing the same prepared statement and simply changing the parameter values will help speed things up, since the MySQL server only has to parse the query once.
INSERT
s can be batched by providing multiple sets of VALUES
- a single query that inserts many rows is much faster than an equivalent number of individual queries each inserting one row.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With