I have bunch of tables where several of them have hundreds of columns. I need to get a count of non-null values for each column and I've been doing it manually. I would like to figure out a way to get all the counts for all the columns in a table. I looked up stackoverflow and google, but unable to find the answer.
I tried this but it's just returning a value of 1 for each column. I know it's just counting the number of column and not the values in each column. Any suggestions?
select count(COLUMN_NAME)
from information_schema.columns
where table_schema = 'schema_name'
and table_name = 'table_name'
group by COLUMN_NAME
The PostgreSQL COUNT function counts a number of rows or non-NULL values against a specific column from a table. When an asterisk(*) is used with count function the total number of rows returns.
You can use GROUP BY on two columns. In the following query, I use group by 1, 2 -- this is a convenient way to group by on the first two columns from SELECT clause. Also, I put two different count() to the query – probably, you will find that in your case it's more semantically correct to use count(distinct ..) .
PostgreSQL COUNT SELECT SELECT COUNT ( [*], [DISTINCT] [column_name] ) FROM TABLE_NAME; Let's dig a little deeper into the syntax shown above: SELECT – This is used to select certain columns from the database. COUNT – This is used to count the number of records in this table.
COUNT(column_name)
always gives you the count of NON NULL
values.
Create a generic function like this which can take schema name and table name as arguments.
Here I am constructing select statements joined together by UNION ALL
s each returning the value of the column_name and it's count for all columns when executed dynamically.
CREATE OR REPLACE FUNCTION public.get_count( TEXT, TEXT )
RETURNS TABLE(t_column_name TEXT, t_count BIGINT )
LANGUAGE plpgsql
AS $BODY$
DECLARE
p_schema TEXT := $1;
p_tabname TEXT := $2;
v_sql_statement TEXT;
BEGIN
SELECT STRING_AGG( 'SELECT '''
|| column_name
|| ''','
|| ' count('
|| column_name
|| ') FROM '
|| table_schema
|| '.'
|| table_name
,' UNION ALL ' ) INTO v_sql_statement
FROM information_schema.columns
WHERE table_schema = p_schema
AND table_name = p_tabname;
IF v_sql_statement IS NOT NULL THEN
RETURN QUERY EXECUTE v_sql_statement;
END IF;
END
$BODY$;
Execution
knayak=# select c.col, c.count from
public.get_count( 'public', 'employees' ) as c(col,count);
col | count
----------------+-------
employee_id | 107
first_name | 107
last_name | 107
email | 107
phone_number | 107
hire_date | 107
job_id | 107
salary | 107
commission_pct | 35
manager_id | 106
department_id | 106
(11 rows)
There's not really a magic way to do this. If you need to check each of 100 different columns to see how many non-null values there are, you'll have to specify each of the columns of the table.
About the best you can do is use the system catalogs to help write your queries:
select 'SUM(CASE WHEN ' + column_name + ' IS NULL THEN 1 ELSE 0 END) AS ' + column_name
from information_schema.columns
where table_schema = 'schema_name'
and table_name = 'table_name'
and is_nullable = 'YES'
You may need to add quoted identifiers if you've got spaces or other special characters in your column names.
Then you can copy that output to another query and add the missing parts of the query. I've added and is_nullable = 'YES'
because it's a waste of time to check NOT NULL columns. As far as I know, that column is present in PostgreSQL.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With