I am trying to find the number of distinct values in each column of a table. Declaratively that is:
for each column of table xyz
run_query("SELECT COUNT(DISTINCT column) FROM xyz")
Finding the column names of a table is shown here.
SELECT column_name
FROM information_schema.columns
WHERE table_name=xyz
However, I don't manage to merge the count query inside. I tried various queries, this one:
SELECT column_name, thecount
FROM information_schema.columns,
(SELECT COUNT(DISTINCT column_name) FROM myTable) AS thecount
WHERE table_name=myTable
is syntactically not allowed (reference to column_name in the nested query not allowed).
This one seems erroneous too (timeout):
SELECT column_name, count(distinct column_name)
FROM information_schema.columns, myTable
WHERE table_name=myTable
What is the right way to get the number of distinct values for each column of a table with one query?
Article SQL to find the number of distinct values in a column talks about a fixed column only.
In general, SQL expects the names of items (fields, tables, roles, indices, constraints, etc) in a statement to be constant. That many database systems let you examine the structure through something like information_schema does not mean you can plug that data into the running statement.
You can however use the information_schema to construct new SQL statements that you execute separately.
First consider your original problem.
CREATE TABLE foo (a numeric, b numeric, c numeric);
INSERT INTO foo(a,b,c)
VALUES (1,1,1), (1,1,2), (1,1,3), (1,2,1), (1,2,2);
SELECT COUNT(DISTINCT a) "distinct a",
COUNT(DISTINCT b) "distinct b",
COUNT(DISTINCT c) "distinct c"
FROM foo;
If you know the name of all of your columns when you are writing the query, that is sufficient.
If you are seeking data for an arbitrary table, you need to construct the SQL statement via SQL (I've added plenty of whitespace so you can see the different levels involved):
SELECT 'SELECT ' || STRING_AGG( 'COUNT (DISTINCT '
|| column_name
|| ') "'
|| column_name
|| '"',
',')
|| ' FROM foo;'
FROM information_schema.columns
WHERE table_name='foo';
That however is just the text of the necessary SQL statement. Depending on how you are accessing Postgresql, it might be easy for you to feed that into a new query, or if you are keeping everything inside Postgresql, then you will have to resort to one of the integrated procedural languages. An excellent (though complex,) discussion of the issues may provide guidance.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With