Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgres: Find number of distinct values for each column

I am trying to find the number of distinct values in each column of a table. Declaratively that is:

for each column of table xyz
run_query("SELECT COUNT(DISTINCT column) FROM xyz")      

Finding the column names of a table is shown here.

SELECT column_name 
FROM information_schema.columns
WHERE table_name=xyz

However, I don't manage to merge the count query inside. I tried various queries, this one:

SELECT column_name, thecount
FROM information_schema.columns, 
   (SELECT COUNT(DISTINCT column_name) FROM myTable) AS thecount
WHERE table_name=myTable

is syntactically not allowed (reference to column_name in the nested query not allowed).

This one seems erroneous too (timeout):

SELECT column_name, count(distinct column_name) 
FROM information_schema.columns, myTable
WHERE table_name=myTable

What is the right way to get the number of distinct values for each column of a table with one query?

Article SQL to find the number of distinct values in a column talks about a fixed column only.

like image 726
7on Avatar asked Nov 11 '22 17:11

7on


1 Answers

In general, SQL expects the names of items (fields, tables, roles, indices, constraints, etc) in a statement to be constant. That many database systems let you examine the structure through something like information_schema does not mean you can plug that data into the running statement.

You can however use the information_schema to construct new SQL statements that you execute separately.

First consider your original problem.

CREATE TABLE foo (a numeric, b numeric, c numeric);

INSERT INTO foo(a,b,c)
     VALUES (1,1,1), (1,1,2), (1,1,3), (1,2,1), (1,2,2);

SELECT COUNT(DISTINCT a) "distinct a",
       COUNT(DISTINCT b) "distinct b",
       COUNT(DISTINCT c) "distinct c"
  FROM foo;

If you know the name of all of your columns when you are writing the query, that is sufficient.

If you are seeking data for an arbitrary table, you need to construct the SQL statement via SQL (I've added plenty of whitespace so you can see the different levels involved):

SELECT 'SELECT ' || STRING_AGG(   'COUNT (DISTINCT '
                               || column_name
                               || ') "'
                               || column_name
                               || '"',
                               ',')
                 || ' FROM foo;'
  FROM information_schema.columns
 WHERE table_name='foo';

That however is just the text of the necessary SQL statement. Depending on how you are accessing Postgresql, it might be easy for you to feed that into a new query, or if you are keeping everything inside Postgresql, then you will have to resort to one of the integrated procedural languages. An excellent (though complex,) discussion of the issues may provide guidance.

like image 180
gwaigh Avatar answered Nov 15 '22 07:11

gwaigh