Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When is sql distinct faster than java programming 'distinct'

If I have a sql query that uses 'distinct' (in oracle), would it be faster than retrieving the non-distinct then getting the unique results via java programming?

I heard somewhere that oracle sql distinct is heavy, but is it heavier than manual 'distinction' via java programming?

Thanks, Franz

like image 481
Franz See Avatar asked Jun 04 '09 10:06

Franz See


People also ask

Why we should not use distinct in SQL?

Summary. DISTINCT makes a query return unique rows only, and is often used injudiciously to suppress duplicate rows being returned by a bad query. Don't use DISTINCT to cover up errors.

Does distinct affect performance?

Yes, the application needs to compare every record to the "distinct" records cache as it goes. You can improve performance by using an index, particularly on the numeric and date fields.

Is SQL distinct slow?

Most of the SELECT DISTINCT queries will perform exactly as fast as their simple SELECT counterparts, because the optimizer will do away with the step necessary for eliminating duplicates.

Does distinct make queries slower?

For MAX and MIN , you probably shouldn't ever use DISTINCT because the results will be the same as without DISTINCT , and the DISTINCT function will make your query substantially slower to return results.


2 Answers

The rule of the thumb is that the data is faster to process in the database than in your programming language.

The reason is that the data is already available in the database, and it saves the effort of going to your app:

  • marshalling data for the driver from the database;
  • network transfer;
  • unmarshalling data from the driver to the application;

As for Oracle DISTINCT being heavy, what it can mean is to never simply throw a DISTINCT at a query just because it seems like a good idea - profile it with a realistic data set since it can have serious performance implications.

like image 183
Robert Munteanu Avatar answered Sep 30 '22 02:09

Robert Munteanu


Two main aspects:

  • If you have to transfer the data over to Java, there's all the overhead of doing that transfer. Doing the work in the database means you don't need to transfer data you don't need.
  • The database will be able to use its indexes, caches etc to speed things up.

I'd be very surprised to find that fetching all the data and then doing a distinct operation in Java was faster than doing it in the database.

like image 21
Jon Skeet Avatar answered Sep 30 '22 00:09

Jon Skeet