I have a very large table with a primary key of BINARY(20)
.
The table has around 17 million rows. Every hour a cron job tries to insert as many as 50,000 new entries into this table with the ON_DUPLICATE_KEY_UPDATE
syntax.
Each insert in the cronjob is with 1,000 values (multiple insert). How can I get the number of rows inserted into the table from this query? I cannot do a row count before and after as there are around 17million rows and the query is too expensive.
In the manual mysql says for a row inserted the affected number of rows is 1
and for an updated field it is 2
, meaning in my 1000 INSERT ON DUPLICATE KEY UPDATE query I could have affected rows ranging from 1000 - 2000, but I have no way of telling how many records were inserted from this number?
How can I overcome this?
Thanks
INSERT ... ON DUPLICATE KEY UPDATE is a MariaDB/MySQL extension to the INSERT statement that, if it finds a duplicate unique or primary key, will instead perform an UPDATE. The row/s affected value is reported as 1 if a row is inserted, and 2 if a row is updated, unless the API's CLIENT_FOUND_ROWS flag is set.
By definition, atomicity requires that each transaction is an all or nothing. So yes it is atomic in the sense that if the data that you are trying to insert will cause a duplicate in the primary key or in the unique index, the statement will instead perform an update and not error out.
ON DUPLICATE KEY UPDATE inserts or updates a row, the LAST_INSERT_ID() function returns the AUTO_INCREMENT value. The ON DUPLICATE KEY UPDATE clause can contain multiple column assignments, separated by commas. The use of VALUES() to refer to the new row and columns is deprecated beginning with MySQL 8.0.
The whole idea of a primary key is to have a unique identifier for each row, so you can not do that. However, if you want a way of grouping rows, you can either add a group column to your table, or create a table for the grouping. For example group_members and have that contain two columns, "group_id" and "row_id".
The number of inserts would be 2000 minus the number of affected rows. More generally:
(numberOfValuesInInsert * 2) - mysql_affected_rows()
EDIT:
As tomas points out, The MySQL docs actually say:
With ON DUPLICATE KEY UPDATE, the affected-rows value per row is 1 if the row is inserted as a new row, 2 if an existing row is updated, and 0 if an existing row is set to its current values.
[emphasis mine]
Consequently, if setting an existing row to the same values is a possibility, it's impossible to tell how many rows were updated vs. inserted, since two inserts would be indistinguishable from one update with different values + one update with the same values.
When Your job does an Insert of 1000 , some are pure Inserts and some are Updates as you have the ON_DUPLICATE_KEY_UPDATE . Thus you get the first equation
(1) Inserts + Updates = No of rows Inserted( in this case 1000)
I take a simple example where you get a value of 1350 for the my_sql_affected_rows . since for an Insert a value of 1 and for update a value of 2 aggregates to my_sql_affected_rows . I get the following equation .
(2) Inserts + 2 * Updates = my_sql_affected_rows (in this case 1350) .
Subtract (2) - (1) . You get
(3) Updates = my_sql_affected_rows - No of rows Inserted
Updates = 1350 - 1000 ( in this example ).
Updates = 350 .
Substitute value of Updates in equation (1) , you get
Inserts = 650
Thus to get the number of Updates , you only need to use equation (3) directly .
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With