Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count the number of attributes that are NULL for a row

I want to add a new column to a table to record the number of attributes whose value are null for each tuple (row). How can I use SQL to get the number?

for example, if a tuple is like this:

Name | Age | Sex
-----+-----+-----
Blice| 100 | null

I want to update the tuple as this:

Name | Age | Sex | nNULL
-----+-----+-----+--------
Blice| 100 | null|  1

Also, because I'm writing a PL/pgSQL function and the table name is obtained from argument, I don't know the schema of a table beforehand. That means I need to update the table with the input table name. Anyone know how to do this?

like image 676
Kingston Chan Avatar asked Jul 16 '15 02:07

Kingston Chan


1 Answers

Possible without spelling out columns. Unpivot columns to rows and count.

The aggregate function count(<expression>) only counts non-null values, while count(*) counts all rows. The shortest and fastest way to count NULL values for more than a few columns is count(*) - count(col) ...

Works for any table with any number of columns of any data types.

In Postgres 9.3+ with built-in JSON functions:

SELECT *, (SELECT count(*) - count(v)
           FROM json_each_text(row_to_json(t)) x(k,v)) AS ct_nulls
FROM   tbl t;

What is x(k,v)?

json_each_text() returns a set of rows with two columns. Default column names are key and value as can be seen in the manual where I linked. I provided table and column aliases so we don't have to rely on default names. The second column is named v.

Or, in any Postgres version since at least 8.3 with the additional module hstore installed, even shorter and a bit faster:

SELECT *,  (SELECT count(*) - count(v) FROM svals(hstore(t)) v) AS ct_nulls
FROM   tbl t;

This simpler version only returns a set of single values. I only provide a simple alias v, which is automatically taken to be table and column alias.

  • Best way to install hstore on multiple schemas in a Postgres database?

Since the additional column is functionally dependent I would consider not to persist it in the table at all. Rather compute it on the fly like demonstrated above or create a tiny function with a polymorphic input type for the purpose:

CREATE OR REPLACE FUNCTION f_ct_nulls(_row anyelement)
  RETURNS int  LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
'SELECT (count(*) - count(v))::int FROM svals(hstore(_row)) v';

(PARALLEL SAFE only for Postgres 9.6 or later.)

Then:

SELECT *, f_ct_nulls(t) AS ct_nulls
FROM   tbl t;

You could wrap this into a VIEW ...

db<>fiddle here - demonstrating all
Old sqlfiddle

This should also answer your second question:

... the table name is obtained from argument, I don't know the schema of a table beforehand. That means I need to update the table with the input table name.

like image 107
Erwin Brandstetter Avatar answered Sep 23 '22 10:09

Erwin Brandstetter