Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Query Number of Populated Rows for Each Column in Table (SQL Server)

I am trying to get a count of non-null values for each column in my table. I have researched the following previously asked SO questions, and have not found a satisfactory answer:

Query to list number of records in each table in a database

get a count of each value from every column in a table SQL Server

SQL Server count number of distinct values in each column of a table

I have written the following code to try to build out a data dictionary for my table to include the name of the column, the number of populated rows in each column, the data type, the length, and whether it is a primary key:

SELECT 
    c.name 'Column Name',
    p.rows 'Row_Count',
    t.Name 'Data type',
    c.max_length 'Max Length',
    ISNULL(i.is_primary_key, 0) 'Primary Key'
FROM    
    sys.columns c
INNER JOIN 
    sys.types t ON c.user_type_id = t.user_type_id
LEFT OUTER JOIN 
    sys.index_columns ic ON ic.object_id = c.object_id AND ic.column_id = c.column_id
LEFT OUTER JOIN 
    sys.indexes i ON ic.object_id = i.object_id AND ic.index_id = i.index_id
LEFT OUTER JOIN
sys.partitions p ON p.OBJECT_ID = i.OBJECT_ID and i.index_id = p.index_id
WHERE
    c.object_id = OBJECT_ID('my_table')

However, the Row_Count column returns all Nulls.

My expected outcome looks like this:

Column_Name Row_Count Data_Type Max_Length Primary_Key
A              10       varchar   50            0
B              10       varchar   50            0
C              7        float     50            0
D              3        float     50            0
E              10       varchar   50            0
like image 222
thedatasleuth Avatar asked Jan 02 '23 00:01

thedatasleuth


2 Answers

Here is one option that does NOT use Dynamic SQL.

Full disclosure, I suspect DS would be more performant. That said, this would work on any virtually any table, view or query. I'm using master..spt_values as a demonstration

Example

Select ColumnName      = Item
      ,B.column_ordinal
      ,Row_Count       = sum(1)
      ,B.system_type_name
      ,B.max_length
      ,Distinct_Values = count(DISTINCT Value)
 From  (
        Select C.*
         From  master..spt_values A
         Cross Apply ( values (cast((Select A.* for XML RAW) as xml))) B(XMLData)
         Cross Apply (
                        Select Item  = replace(xAttr.value('local-name(.)', 'varchar(100)'),'_x0020_',' ')
                              ,Value = xAttr.value('.','varchar(max)')
                         From  XMLData.nodes('//@*') xNode(xAttr)
                     ) C
       ) A
 Left Join  (
        Select * from sys.dm_exec_describe_first_result_set('Select * from master..spt_values',null,null )  
       ) B on A.Item=B.name
 Group By A.Item
         ,B.system_type_name
         ,B.max_length
         ,B.column_ordinal 
 Order By B.column_ordinal 

Returns

enter image description here

EDIT

As Larnu mentioned this will fail with (var)binary and image. Again, this will NOT perform well on a large table. I've only used such approaches during my discovery phase.

like image 95
John Cappelletti Avatar answered Jan 08 '23 01:01

John Cappelletti


God this is ugly...

DECLARE @SQL nvarchar(MAX);
DECLARE @Table sysname = 'SampleTable';
DECLARE @Schema sysname = 'dbo';

SET @SQL = N'WITH Counts AS (' + NCHAR(13) + NCHAR(10) + 
           N'    SELECT @Schema AS SchemaName,' + NCHAR(13) + NCHAR(10) +
           N'           @Table AS TableName,' +
           STUFF((SELECT N',' + NCHAR(13) + NCHAR(10) + 
                         N'           COUNT(' + QUOTENAME(C.COLUMN_NAME) + N') AS ' + QUOTENAME(COLUMN_NAME)
                  FROM INFORMATION_SCHEMA.COLUMNS C
                  WHERE C.TABLE_SCHEMA = @Schema
                    AND C.TABLE_NAME = @Table
                  FOR XML PATH(N''),TYPE).value('.','nvarchar(MAX)'),1,14,N'') + NCHAR(13) + NCHAR(10) + 
           N'    FROM ' + QUOTENAME(@Table) + N')' + NCHAR(13) + NCHAR(10) + 
           N'SELECT V.ColumnName,' + NCHAR(13) + NCHAR(10) + 
           N'       V.NonNullCount,' + NCHAR(13) + NCHAR(10) + 
           N'       ISC.DATA_TYPE + ISNULL(DT.S,'''') AS Datatype,' + NCHAR(13) + NCHAR(10) +
           N'       ISNULL(PK.IsPrimaryKey,''No'') AS PrimaryKey' + NCHAR(13) + NCHAR(10) +
           N'FROM Counts C' + NCHAR(13) + NCHAR(10) + 
           N'     CROSS APPLY(VALUES' + STUFF((SELECT N',' + NCHAR(13) + NCHAR(10) + 
                                                      N'                       (N' + QUOTENAME(C.COLUMN_NAME,'''') + N',C.' + QUOTENAME(C.COLUMN_NAME) + N')'
                                               FROM INFORMATION_SCHEMA.COLUMNS C
                                               WHERE C.TABLE_NAME = @Table
                                               FOR XML PATH(N''),TYPE).value('.','nvarchar(MAX)'),1,26,N'') + N')V(ColumnName,NonNullCount)' + NCHAR(13) + NCHAR(10) +
           N'     JOIN INFORMATION_SCHEMA.COLUMNS ISC ON C.SchemaName = ISC.TABLE_SCHEMA' + NCHAR(13) + NCHAR(10) +
           N'                                        AND C.TableName = ISC.TABLE_NAME' + NCHAR(13) + NCHAR(10) +
           N'                                        AND V.ColumnName = ISC.COLUMN_NAME' + NCHAR(13) + NCHAR(10) + 
           N'     CROSS APPLY (VALUES(''('' + STUFF(CONCAT('','' + CASE ISC.CHARACTER_MAXIMUM_LENGTH WHEN -1 THEN ''MAX'' ELSE CONVERT(varchar(4),ISC.CHARACTER_MAXIMUM_LENGTH) END,' + NCHAR(13) + NCHAR(10)+
           N'                                            '','' + CASE WHEN ISC.DATA_TYPE NOT LIKE ''%int'' THEN CONVERT(varchar(4),ISC.NUMERIC_PRECISION) END,' + NCHAR(13) + NCHAR(10) +
           N'                                            '','' + CASE WHEN ISC.DATA_TYPE NOT LIKE ''%int'' THEN CONVERT(varchar(4),ISC.NUMERIC_SCALE) END,' + NCHAR(13) + NCHAR(10) +
           N'                                            '','' + CONVERT(varchar(4),ISC.DATETIME_PRECISION)),1,1,'''') + '')'')) DT(S)' + NCHAR(13) + NCHAR(10) +
           N'     OUTER APPLY(SELECT ''Yes'' AS IsPrimaryKey ' + NCHAR(13) + NCHAR(10) + 
           N'                 FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS TC' + NCHAR(13) + NCHAR(10) + 
           N'                      JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE KCU ON TC.TABLE_SCHEMA = KCU.TABLE_SCHEMA' + NCHAR(13) + NCHAR(10) + 
           N'                                                                  AND TC.TABLE_NAME = KCU.TABLE_NAME' + NCHAR(13) + NCHAR(10) + 
           N'                                                                  AND TC.CONSTRAINT_NAME = KCU.CONSTRAINT_NAME' + NCHAR(13) + NCHAR(10) + 
           N'                 WHERE TC.CONSTRAINT_TYPE = ''PRIMARY KEY''' + NCHAR(13) + NCHAR(10) + 
           N'                   AND KCU.COLUMN_NAME = V.ColumnName' + NCHAR(13) + NCHAR(10) + 
           N'                   AND TC.TABLE_SCHEMA = ISC.TABLE_SCHEMA' + NCHAR(13) + NCHAR(10) + 
           N'                   AND TC.TABLE_NAME = ISC.TABLE_NAME) PK;';

PRINT @SQL; --Might need to use SELECT here
--SELECT @SQL;
EXEC sp_executesql @SQL, N'@Schema sysname,@Table sysname',@Schema = @Schema, @Table = @Table;

db<>fiddle

Honestly, there is a lot going on here. If it needs an explanation, I will try, but it will take some time, so (and no offence), i'm not going to put the effort in if no one is going to want/need to know.

One note, I have been slightly lazy and no joined on SCHEMA_NAME. If you are using multiple schemas, with same named objects, this will have problem, and will need to be addressed.

Edit: apparently I'm a glutton for punishment. Fixed the Schema "problem" and added some logic in regards to ints

New Fiddle

like image 40
Larnu Avatar answered Jan 08 '23 00:01

Larnu