Is my understanding of "select distinct" correct?

Tags:

We recently discovered a performance problem with one of our systems and I think I have the fix but I'm not certain my understanding is correct.

In simplest form, we have a table blah into which we accumulate various values based on a key field. The basic form is:

recdate   date
rectime   time
system    varchar(20)
count     integer
accum1    integer
accum2    integer

There are a lot more accumulators than that but they're all of the same form. The primary key is made up of recdate, rectime and system.

As values are collected to the table, the count for a given recdate/rectime/system is incremented and the values for that key are added to the accumulators. That means the averages can be obtained by using accumN / count.

Now we also have a view over that table specified as follows:

create view blah_v (
    recdate, rectime, system, count,
    accum1,
    accum2
) as select distinct
    recdate, rectime, system, count,
    value (case when count > 0 then accum1 / count end, 0),
    value (case when count > 0 then accum2 / count end, 0)
    from blah;

In other words, the view gives us the average value of the accumulators rather than the sums. It also makes sure we don't get a divide-by-zero in those cases where the count is zero (these records do exist and we are not allowed to remove them so don't bother telling me they're rubbish - you're preaching to the choir).

We've noticed that the time difference between doing:

select distinct recdate from XX

varies greatly depending on whether we use the table or the view. I'm talking about the difference being 1 second for the table and 27 seconds for the view (with 100K rows).

We actually tracked it back to the select distinct. What seems to be happening is that the DBMS is actually loading all the rows in and sorting them so as to remove duplicates. That's fair enough, it's what we stupidly told it to do.

But I'm pretty sure the fact that the view includes every component of the primary key means that it's impossible to have duplicates anyway. We've validated the problem since, if we create another view without the distinct, it performs at the same speed as the underlying table.

I just wanted to confirm my understanding that a select distinct can not have duplicates if it includes all the primary key components. If that's so, then we can simply change the view appropriately.

455

asked May 03 '10 07:05

paxdiablo

2 Answers

In this case the DISTINCT isn't doing you any good because your distinct fields are already guaranteed to be unique because of the PRIMARY KEY constraint on the underlying table. You might try rewriting the view as:

create view blah_v ( 
    recdate, rectime, system, count, 
    accum1, 
    accum2 
) as select
       recdate, rectime, system, count, 
       case when count > 0 then accum1 / count else 0 end,
       case when count > 0 then accum2 / count else 0 end,
       from blah;

Share and enjoy.

166

answered Sep 23 '22 13:09

Bob Jarvis - Слава Україні

Yes, there's no point in asking for distinct results when all the primary key elements are included.

The table's primary key constraint already precludes duplicates across those columns, but your DBMS will still process the tuples to ensure they are distinct.

answered Sep 23 '22 13:09

John Flatness

Related questions
                            
                                Find the record count of the derived table in SQL Server
                            
                                How to extract month name on a string datatype on athena
                            
                                Improve PostgresSQL aggregation query performance
                            
                                How do I create unique constraint for multiple columns?
                            
                                What's optimal? UNION vs WHERE IN (str1, str2, str3)
                            
                                REPLACE and Unicode characters in SQL
                            
                                Choosing ISAM rather than SQL
                            
                                ASP.NET MVC TDD with LINQ and SQL database
                            
                                SQL to transpose row pairs to columns in MS ACCESS database
                            
                                How can I make a stored procedure return a "dataset" using a parameter I pass?
                            
                                Migrating MySQL to a table with different structure
                            
                                MS ACCESS: How can i count distinct value using access query?
                            
                                Query Multiple databases with single ado.net query
                            
                                TSQL - Help with UNPIVOT
                            
                                Is there a way to run a stored procedure at predefined intervals?
                            
                                Is it bad to not use a DB but use in memory objects?
                            
                                Difference between Cluster and Non-cluster index in SQL
                            
                                Best way to store sales tax information
                            
                                SQL Join Tables
                            
                                Algorithm: Build a recommendation for movies you might like

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is my understanding of "select distinct" correct?

Tags:

performance

sql

select

distinct

db2

paxdiablo

People also ask

2 Answers

Bob Jarvis - Слава Україні

John Flatness

Recent Activity

Donate For Us