Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

mysql locking rows in a select query for update

So I am writing a simple website crawler for maintenance of in house sites. It will go through each link, adding new links as it finds them, noting down title and h1 tags etc.

Occasionally it duplicates titles and H1 tags, when there is only one in the source when I check it manually.

The reason this is happening is because the crawl script is running via cron and it appears to be overlapping, so processing the same page twice.

The script will basically grab a page that has been uncrawled, then if the http response is 200 it will mark it as crawled, and process what it needs to.

So somewhere between the SELECT and the UPDATE, another thread of the script is running on the same row that was SELECTed.

Is there a way to either SELECT and UPDATE in the same query, or lock the row returned in the SELECT so it cannot be returned again in another query in another thread until I am finished with it?

Have had a look at - http://dev.mysql.com/doc/refman/5.0/en/innodb-locking-reads.html and general SELECT FOR UPDATE stuff, but I am still unsure.

Edit

I am using something like this

START TRANSACTION;
SELECT .. FOR UPDATE;
UPDATE .... ;
COMMIT;

But its not liking it. Am def using InnoDB on that table. I am thinking this may not be the way forward, as it simply puts off the processing of the row until after the commit, when I want it to physically not be able to SELECT the row again.

I have covered this off by doing the SELECT, and then afterwards doing an UPDATE to flag a field as crawled before it processes it, but the fact that this is not seamless seems to be causing the problem. I need a way to seamlessly SELECT and UPDATE the field, or SELECT and stop it being SELECTed again until I say so.

like image 360
Horse Avatar asked Aug 11 '11 16:08

Horse


People also ask

Does MySQL lock a row for update?

FOR UPDATE : Any lock placed with the `FOR UPDATE` will not allow other transactions to read, update or delete the row. Other transaction can read this rows only once first transaction get commit or rollback.

Does SELECT for update lock read?

A SELECT ... FOR UPDATE reads the latest available data, setting exclusive locks on each row it reads. Thus, it sets the same locks a searched SQL UPDATE would set on the rows.

How do I lock a row in MySQL?

Record Locks A record lock is a lock on an index record. For example, SELECT c1 FROM t WHERE c1 = 10 FOR UPDATE; prevents any other transaction from inserting, updating, or deleting rows where the value of t. c1 is 10 . Record locks always lock index records, even if a table is defined with no indexes.

Does SELECT statement lock the rows?

SELECT statements get shared locks on the rows that satisfy the WHERE clause (but do not prevent inserts into this range). UPDATEs and DELETEs get exclusive locks on a range of rows. INSERT statements get exclusive locks on single rows (and sometimes on the preceding rows).


1 Answers

You answered the question yourself :). SELECT FOR UPDATE is exactly what you need if I understand your question correctly. Remember to turn off autocommit, start a transaction before select and commit the transaction after update.

Update:

I think this will do what you want:

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION;
SELECT .. FOR UPDATE;
UPDATE .... ;
COMMIT TRANSACTION;
like image 173
snap Avatar answered Sep 25 '22 00:09

snap