Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to count exact number of rows in a very large table?

Tags:

sql

database

People also ask

What is the most performant way to get total number of records from a table?

The best way to get the record count is to use the sys. dm_db_partition_stats or sys. partitions system views (there is also sysindexes, but it has been left for the backward compatibility with SQL Server 2000).

What is the method to count the number of rows in a table?

To count the number of rows, the “#Table_Id tr” selector is used. It selects all the <tr> elements in the table. This includes the row that contains the heading of the table. The length property is used on the selected elements to get the number of rows.

What is fastest way to execute the query with millions of records?

1:- Check Indexes. 2:- There should be indexes on all fields used in the WHERE and JOIN portions of the SQL statement 3:- Limit Size of Your Working Data Set. 4:- Only Select Fields You select as Need. 5:- Remove Unnecessary Table and index 6:- Remove OUTER JOINS.

How do you make the select count faster?

So to make SELECT COUNT(*) queries fast, here's what to do:Get on any version that supports batch mode on columnstore indexes, and put a columnstore index on the table – although your experiences are going to vary dramatically depending on the kind of query you have.


Simple answer:

  • Database vendor independent solution = use the standard = COUNT(*)
  • There are approximate SQL Server solutions but don't use COUNT(*) = out of scope

Notes:

COUNT(1) = COUNT(*) = COUNT(PrimaryKey) just in case

Edit:

SQL Server example (1.4 billion rows, 12 columns)

SELECT COUNT(*) FROM MyBigtable WITH (NOLOCK)
-- NOLOCK here is for me only to let me test for this answer: no more, no less

1 runs, 5:46 minutes, count = 1,401,659,700

--Note, sp_spaceused uses this DMV
SELECT
   Total_Rows= SUM(st.row_count)
FROM
   sys.dm_db_partition_stats st
WHERE
    object_name(object_id) = 'MyBigtable' AND (index_id < 2)

2 runs, both under 1 second, count = 1,401,659,670

The second one has less rows = wrong. Would be the same or more depending on writes (deletes are done out of hours here)


The fastest way by far on MySQL is:

SHOW TABLE STATUS;

You will instantly get all your tables with the row count (which is the total) along with plenty of extra information if you want.


I got this script from another StackOverflow question/answer:

SELECT SUM(p.rows) FROM sys.partitions AS p
  INNER JOIN sys.tables AS t
  ON p.[object_id] = t.[object_id]
  INNER JOIN sys.schemas AS s
  ON s.[schema_id] = t.[schema_id]
  WHERE t.name = N'YourTableNameHere'
  AND s.name = N'dbo'
  AND p.index_id IN (0,1);

My table has 500 million records and the above returns in less than 1ms. Meanwhile,

SELECT COUNT(id) FROM MyTable

takes a full 39 minutes, 52 seconds!

They yield the exact same number of rows (in my case, exactly 519326012).

I do not know if that would always be the case.


You can try this sp_spaceused (Transact-SQL)

Displays the number of rows, disk space reserved, and disk space used by a table, indexed view, or Service Broker queue in the current database, or displays the disk space reserved and used by the whole database.


I have come across articles that state that SELECT COUNT(*) FROM TABLE_NAME will be slow when the table has lots of rows and lots of columns.

That depends on the database. Some speed up counts, for instance by keeping track of whether rows are live or dead in the index, allowing for an index only scan to extract the number of rows. Others do not, and consequently require visiting the whole table and counting live rows one by one. Either will be slow for a huge table.

Note that you can generally extract a good estimate by using query optimization tools, table statistics, etc. In the case of PostgreSQL, for instance, you could parse the output of explain count(*) from yourtable and get a reasonably good estimate of the number of rows. Which brings me to your second question.

I have a table that might contain even billions of rows [it has approximately 15 columns]. Is there a better way to get the EXACT count of the number of rows of a table?

Seriously? :-) You really mean the exact count from a table with billions of rows? Are you really sure? :-)

If you really do, you could keep a trace of the total using triggers, but mind concurrency and deadlocks if you do.