Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to SELECT COUNT from tables currently being INSERT?

Hi consider there is an INSERT statement running on a table TABLE_A, which takes a long time, I would like to see how has it progressed.

What I tried was to open up a new session (new query window in SSMS) while the long running statement is still in process, I ran the query

SELECT COUNT(1) FROM TABLE_A WITH (nolock)

hoping that it will return right away with the number of rows everytime I run the query, but the test result was even with (nolock), still, it only returns after the INSERT statement is completed.

What have I missed? Do I add (nolock) to the INSERT statement as well? Or is this not achievable?


(Edit) OK, I have found what I missed. If you first use CREATE TABLE TABLE_A, then INSERT INTO TABLE_A, the SELECT COUNT will work. If you use SELECT * INTO TABLE_A FROM xxx, without first creating TABLE_A, then non of the following will work (not even sysindexes).

like image 702
user1589188 Avatar asked Jul 22 '15 10:07

user1589188


2 Answers

Short answer: You can't do this.

Longer answer: A single INSERT statement is an atomic operation. As such, the query has either inserted all the rows or has inserted none of them. Therefore you can't get a count of how far through it has progressed.

Even longer answer: Martin Smith has given you a way to achieve what you want. Whether you still want to do it that way is up to you of course. Personally I still prefer to insert in manageable batches if you really need to track progress of something like this. So I would rewrite the INSERT as multiple smaller statements. Depending on your implementation, that may be a trivial thing to do.

like image 189
DavidG Avatar answered Oct 13 '22 15:10

DavidG


If you are using SQL Server 2016 the live query statistics feature can allow you to see the progress of the insert in real time.

The below screenshot was taken while inserting 10 million rows into a table with a clustered index and a single nonclustered index.

It shows that the insert was 88% complete on the clustered index and this will be followed by a sort operator to get the values into non clustered index key order before inserting into the NCI. This is a blocking operator and the sort cannot output any rows until all input rows are consumed so the operators to the left of this are 0% done.

enter image description here

With respect to your question on NOLOCK

It is trivial to test

Connection 1

USE tempdb

CREATE TABLE T2
  (
     X INT IDENTITY PRIMARY KEY,
     F CHAR(8000)
  );

WHILE NOT EXISTS(SELECT * FROM   T2 WITH (NOLOCK))
  LOOP:

SELECT COUNT(*) AS CountMethod FROM   T2 WITH (NOLOCK);

SELECT rows FROM   sysindexes WHERE  id = OBJECT_ID('T2');

RAISERROR ('Waiting for 10 seconds',0,1) WITH NOWAIT;

WAITFOR delay '00:00:10';

SELECT COUNT(*) AS CountMethod FROM   T2 WITH (NOLOCK);

SELECT rows FROM   sysindexes WHERE  id = OBJECT_ID('T2');

RAISERROR ('Waiting to drop table',0,1) WITH NOWAIT

DROP TABLE T2 

Connection 2

use tempdb;

--Insert 2000 * 2000 = 4 million rows
WITH T
     AS (SELECT TOP 2000 'x' AS x
         FROM   master..spt_values)
INSERT INTO T2
            (F)
SELECT 'X'
FROM   T v1
       CROSS JOIN T v2
OPTION (MAXDOP 1) 

Example Results - Showing row count increasing

enter image description here

SELECT queries with NOLOCK allow dirty reads. They don't actually take no locks and can still be blocked, they still need a SCH-S (schema stability) lock on the table (and on a heap it will also take a hobt lock).

The only thing incompatible with a SCH-S is a SCH-M (schema modification) lock. Presumably you also performed some DDL on the table in the same transaction (e.g. perhaps created it in the same tran)

For the use case of a large insert, where an approximate in flight result is fine, I generally just poll sysindexes as shown above to retrieve the count from metadata rather than actually counting the rows (non deprecated alternative DMVs are available)

When an insert has a wide update plan you can even see it inserting to the various indexes in turn that way.

If the table is created inside the inserting transaction this sysindexes query will still block though as the OBJECT_ID function won't return a result based on uncommitted data regardless of the isolation level in effect. It's sometimes possible to get around that by getting the object_id from sys.tables with nolock instead.

like image 26
Martin Smith Avatar answered Oct 13 '22 16:10

Martin Smith