Detect all columns in an oracle table which have the same value in each row

Tags:

Every day, the requests get weirder and weirder.

I have been asked to put together a query to detect which columns in a table contain the same value for all rows. I said "That needs to be done by program, so that we can do it in one pass of the table instead of N passes."

I have been overruled.

So long story short. I have this very simple query which demonstrates the problem. It makes 4 passes over the test set. I am looking for ideas for SQL Magery which do not involve adding indexes on every column, or writing a program, or taking a full human lifetime to run.

And sigh It needs to be able to work on any table.

Thanks in advance for your suggestions.

WITH TEST_CASE AS
(
    SELECT 'X' A, 5 B, 'FRI' C, NULL D FROM DUAL UNION ALL
    SELECT 'X' A, 3 B, 'FRI' C, NULL D FROM DUAL UNION ALL
    SELECT 'X' A, 7 B, 'TUE' C, NULL D FROM DUAL 
),
KOUNTS AS 
(
    SELECT SQRT(COUNT(*)) S, 'Column A' COLUMNS_WITH_SINGLE_VALUES
    FROM TEST_CASE P, TEST_CASE Q
    WHERE P.A = Q.A OR (P.A IS NULL AND Q.A IS NULL)

    UNION ALL

    SELECT SQRT(COUNT(*)) S, 'Column B' COLUMNS_WITH_SINGLE_VALUES
    FROM TEST_CASE P, TEST_CASE Q
    WHERE P.B = Q.B OR (P.B IS NULL AND Q.B IS NULL)

    UNION ALL

    SELECT SQRT(COUNT(*)) S, 'Column C' COLUMNS_WITH_SINGLE_VALUES
    FROM TEST_CASE P, TEST_CASE Q
    WHERE P.C = Q.C OR (P.C IS NULL AND Q.C IS NULL)

    UNION ALL

    SELECT SQRT(COUNT(*)) S, 'Column D' COLUMNS_WITH_SINGLE_VALUES
    FROM TEST_CASE P, TEST_CASE Q
    WHERE P.D = Q.D OR (P.D IS NULL AND Q.D IS NULL)
)
SELECT COLUMNS_WITH_SINGLE_VALUES
FROM KOUNTS
WHERE S = (SELECT COUNT(*) FROM TEST_CASE)

722

asked Oct 31 '13 16:10

EvilTeach

3 Answers

do you mean something like this?

WITH 
TEST_CASE AS
(
    SELECT 'X' A, 5 B, 'FRI' C, NULL D FROM DUAL UNION ALL
    SELECT 'X' A, 3 B, 'FRI' C, NULL D FROM DUAL UNION ALL
    SELECT 'X' A, 7 B, 'TUE' C, NULL D FROM DUAL 
)
select case when min(A) = max(A) THEN 'A'
            when min(B) = max(B) THEN 'B'
            when min(C) = max(C) THEN 'C'
            when min(D) = max(D) THEN 'D'
            else 'No one'
       end 
from TEST_CASE

Edit this works:

WITH 
TEST_CASE AS
(
    SELECT 'X' A, 5 B, 'FRI' C, NULL D FROM DUAL UNION ALL
    SELECT 'X' A, 3 B, 'FRI' C, NULL D FROM DUAL UNION ALL
    SELECT 'X' A, 7 B, 'TUE' C, NULL D FROM DUAL 
)
select case when min(nvl(A,0)) = max(nvl(A,0)) THEN 'A ' end ||
       case when min(nvl(B,0)) = max(nvl(B,0)) THEN 'B ' end ||
       case when min(nvl(C,0)) = max(nvl(C,0)) THEN 'C ' end ||
       case when min(nvl(D,0)) = max(nvl(D,0)) THEN 'D ' end c

from TEST_CASE

Bonus: I have also added the check for the null values, so the result now is: A and D

And the SQLFiddle demo for you.

answered Oct 14 '22 16:10

mucio

Optimizer statistics can easily identify columns with more than one distinct value. After statistics are gathered a simple query against the data dictionary will return the results almost instantly.

The results will only be accurate on 10g if you use ESTIMATE_PERCENT = 100. The results will be accurate on 11g+ if you use ESTIMATE_PERCENT = 100 or AUTO_SAMPLE_SIZE.

Code

create table test_case(a varchar2(1), b number, c varchar2(3),d number,e number);

--I added a new test case, E.  E has null and not-null values.
--This is a useful test because null and not-null values are counted separately.
insert into test_case
SELECT 'X' A, 5 B, 'FRI' C, NULL D, NULL E FROM DUAL UNION ALL
SELECT 'X' A, 3 B, 'FRI' C, NULL D, NULL E FROM DUAL UNION ALL
SELECT 'X' A, 7 B, 'TUE' C, NULL D, 1    E FROM DUAL;

--Gather stats with default settings, which uses AUTO_SAMPLE_SIZE.
--One advantage of this method is that you can quickly get information for many
--tables at one time.
begin
    dbms_stats.gather_schema_stats(user);
end;
/

--All columns with more than one distinct value.
--Note that nulls and not-nulls are counted differently.
--Not-nulls are counted distinctly, nulls are counted total.
select owner, table_name, column_name
from dba_tab_columns
where owner = user
    and num_distinct + least(num_nulls, 1) <= 1
order by column_name;

OWNER     TABLE_NAME   COLUMN_NAME
-------   ----------   -----------
JHELLER   TEST_CASE    A          
JHELLER   TEST_CASE    D

Performance

On 11g, this method might be about as fast as mucio's SQL statement. Options like cascade => false would improve performance by not analyzing indexes.

But the great thing about this method is that it also produces useful statistics. If the system is already gathering statistics at regular intervals the hard work may already be done.

Details about AUTO_SAMPLE_SIZE algorithm

AUTO_SAMPLE_SIZE was completely changed in 11g. It does not use sampling for estimating number of distinct values (NDV). Instead it scans the whole table and uses a hash-based distinct algorithm. This algorithm does not require large amounts of memory or temporary tablespace. It's much faster to read the whole table than to sort even a small part of it. The Oracle Optimizer blog has a good description of the algorithm here. For even more details, see this presentation by Amit Podder. (You'll want to scan through that PDF if you want to verify the details in my next section.)

Possibility of a wrong result

Although the new algorithm does not use a simple sampling algorithm it still does not count the number of distinct values 100% correctly. It's easy to find cases where the estimated number of distinct values is not the same as the actual. But if the number of distinct values are clearly inaccurate, how can they be trusted in this solution?

The potential inaccuracy comes from two sources - hash collisions and synopsis splitting. Synopsis splitting is the main source of inaccuracy but does not apply here. It only happens when there are 13864 distinct values. And it never throws out all of the values, the final estimate will certainly be much larger than 1.

The only real concern is what are the chances of there being 2 distinct values with a hash collision. With a 64-bit hash the chance could be as low as 1 in 18,446,744,073,709,551,616. Unfortunately I don't know the details of their hashing algorithm and don't know the real probability. I was unable to produce any collisions from some simple testing and from previous real-life tests. (One of my tests was to use large values, since some statistics operations only use the first N bytes of data.)

Now also consider that this will only happen if all of the distinct values in the table collide. What are the chances of there being a table with only two values that just happen to collide? Probably much less than the chance of winning the lottery and getting struck by a meteorite at the same time.

answered Oct 14 '22 18:10

Jon Heller

If you can live with the result on a single line, this should only scan once;

WITH TEST_CASE AS
(
    SELECT 'X' A, 5 B, 'FRI' C, NULL D FROM DUAL UNION ALL
    SELECT 'X' A, 3 B, 'FRI' C, NULL D FROM DUAL UNION ALL
    SELECT 'X' A, 7 B, 'TUE' C, NULL D FROM DUAL
)
SELECT 
  CASE WHEN COUNT(DISTINCT A) + 
            COUNT(DISTINCT CASE WHEN A IS NULL THEN 1 END) = 1
       THEN 1 ELSE 0 END SAME_A,
  CASE WHEN COUNT(DISTINCT B) + 
            COUNT(DISTINCT CASE WHEN B IS NULL THEN 1 END) = 1
       THEN 1 ELSE 0 END SAME_B,
  CASE WHEN COUNT(DISTINCT C) + 
            COUNT(DISTINCT CASE WHEN C IS NULL THEN 1 END) = 1
       THEN 1 ELSE 0 END SAME_C,
  CASE WHEN COUNT(DISTINCT D) + 
            COUNT(DISTINCT CASE WHEN D IS NULL THEN 1 END) = 1
       THEN 1 ELSE 0 END SAME_D
FROM TEST_CASE

An SQLfiddle to test with.

answered Oct 14 '22 18:10

Joachim Isaksson

Related questions
                            
                                Retrieve Rank from sqlite table
                            
                                What does "WHERE id <> 0" clause mean in SQL?
                            
                                Update table based on another table
                            
                                pl/sql function called how many times?
                            
                                How to store images to a varbinary(max) column?
                            
                                Django Raw SQL give me TypeError not enough arguments
                            
                                How to implement MVC style on my PHP/SQL/HTML/CSS code? [closed]
                            
                                Convert PostgreSQL dump file to MySQL?
                            
                                Can I create create a global function in SQL Server?
                            
                                SQL Inner Join. ON condition vs WHERE clause
                            
                                NOT IN operator issue Oracle
                            
                                catching errors from Postgresql to PHP
                            
                                Oracle SQL : search column that starts with digits
                            
                                Oracle - literal does not match format string error [duplicate]
                            
                                ORA-01036; illegal variable name/number
                            
                                Undefined function 'Nz' in expression
                            
                                Reverse boolean (bit) value in SQL CE select statement
                            
                                how to create XML schema from an existing database in SQL Server 2008
                            
                                Counting number of records for specific ranges sql server
                            
                                How to fill up a TableView with database data

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Detect all columns in an oracle table which have the same value in each row

Tags:

sql

oracle

oracle10g