When writing database queries in something like TSQL or PLSQL, we often have a choice of iterating over rows with a cursor to accomplish the task, or crafting a single SQL statement that does the same job all at once.
Also, we have the choice of simply pulling a large set of data back into our application and then processing it row by row, with C# or Java or PHP or whatever.
Why is it better to use set-based queries? What is the theory behind this choice? What is a good example of a cursor-based solution and its relational equivalent?
Cursors could be used in some applications for serialized operations as shown in example above, but generally they should be avoided because they bring a negative impact on performance, especially when operating on a large sets of data.
Temporary tables have been in use for a long time and provide an excellent way to replace cursors for large data sets. Just like table variables, temporary tables can hold the result set so that we can perform the necessary operations by processing it with an iterating algorithm such as a 'while' loop.
The main reason that I'm aware of is that set-based operations can be optimised by the engine by running them across multiple threads. For example, think of a quicksort - you can separate the list you're sorting into multiple "chunks" and sort each separately in their own thread. SQL engines can do similar things with huge amounts of data in one set-based query.
When you perform cursor-based operations, the engine can only run sequentially and the operation has to be single threaded.
Set based queries are (usually) faster because:
Pulling data out to the middle tier to process it can be useful, though, because it removes the processing overhead off the DB server (which is the hardest thing to scale, and is normally doing other things as well). Also, you normally don't have the same overheads (or benefits) in the middle tier. Things like transactional logging, built-in locking and blocking, etc. - sometimes these are necessary and useful, other times they're just a waste of resources.
A simple cursor with procedural logic vs. set based example (T-SQL) that will assign an area code based on the telephone exchange:
--Cursor DECLARE @phoneNumber char(7) DECLARE c CURSOR LOCAL FAST_FORWARD FOR SELECT PhoneNumber FROM Customer WHERE AreaCode IS NULL OPEN c FETCH NEXT FROM c INTO @phoneNumber WHILE @@FETCH_STATUS = 0 BEGIN DECLARE @exchange char(3), @areaCode char(3) SELECT @exchange = LEFT(@phoneNumber, 3) SELECT @areaCode = AreaCode FROM AreaCode_Exchange WHERE Exchange = @exchange IF @areaCode IS NOT NULL BEGIN UPDATE Customer SET AreaCode = @areaCode WHERE CURRENT OF c END FETCH NEXT FROM c INTO @phoneNumber END CLOSE c DEALLOCATE c END --Set UPDATE Customer SET AreaCode = AreaCode_Exchange.AreaCode FROM Customer JOIN AreaCode_Exchange ON LEFT(Customer.PhoneNumber, 3) = AreaCode_Exchange.Exchange WHERE Customer.AreaCode IS NULL
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With