How does DISTINCT work when using JPA and Hibernate

Tags:

What column does DISTINCT work with in JPA and is it possible to change it?

Here's an example JPA query using DISTINCT:

select DISTINCT c from Customer c

Which doesn't make a lot of sense - what column is the distinct based on? Is it specified on the Entity as an annotation because I couldn't find one?

I would like to specify the column to make the distinction on, something like:

select DISTINCT(c.name) c from Customer c

I'm using MySQL and Hibernate.

388

asked Aug 28 '09 10:08

Steve Claridge

2 Answers

Depending on the underlying JPQL or Criteria API query type, DISTINCT has two meanings in JPA.

Scalar queries

For scalar queries, which return a scalar projection, like the following query:

List<Integer> publicationYears = entityManager .createQuery(     "select distinct year(p.createdOn) " +     "from Post p " +     "order by year(p.createdOn)", Integer.class) .getResultList();  LOGGER.info("Publication years: {}", publicationYears);

The DISTINCT keyword should be passed to the underlying SQL statement because we want the DB engine to filter duplicates prior to returning the result set:

SELECT DISTINCT     extract(YEAR FROM p.created_on) AS col_0_0_ FROM     post p ORDER BY     extract(YEAR FROM p.created_on)  -- Publication years: [2016, 2018]

Entity queries

For entity queries, DISTINCT has a different meaning.

Without using DISTINCT, a query like the following one:

List<Post> posts = entityManager .createQuery(     "select p " +     "from Post p " +     "left join fetch p.comments " +     "where p.title = :title", Post.class) .setParameter(     "title",      "High-Performance Java Persistence eBook has been released!" ) .getResultList();  LOGGER.info(     "Fetched the following Post entity identifiers: {}",      posts.stream().map(Post::getId).collect(Collectors.toList()) );

is going to JOIN the post and the post_comment tables like this:

SELECT p.id AS id1_0_0_,        pc.id AS id1_1_1_,        p.created_on AS created_2_0_0_,        p.title AS title3_0_0_,        pc.post_id AS post_id3_1_1_,        pc.review AS review2_1_1_,        pc.post_id AS post_id3_1_0__ FROM   post p LEFT OUTER JOIN        post_comment pc ON p.id=pc.post_id WHERE        p.title='High-Performance Java Persistence eBook has been released!'  -- Fetched the following Post entity identifiers: [1, 1]

But the parent post records are duplicated in the result set for each associated post_comment row. For this reason, the List of Post entities will contain duplicate Post entity references.

To eliminate the Post entity references, we need to use DISTINCT:

List<Post> posts = entityManager .createQuery(     "select distinct p " +     "from Post p " +     "left join fetch p.comments " +     "where p.title = :title", Post.class) .setParameter(     "title",      "High-Performance Java Persistence eBook has been released!" ) .getResultList();   LOGGER.info(     "Fetched the following Post entity identifiers: {}",      posts.stream().map(Post::getId).collect(Collectors.toList()) );

But then DISTINCT is also passed to the SQL query, and that's not desirable at all:

SELECT DISTINCT        p.id AS id1_0_0_,        pc.id AS id1_1_1_,        p.created_on AS created_2_0_0_,        p.title AS title3_0_0_,        pc.post_id AS post_id3_1_1_,        pc.review AS review2_1_1_,        pc.post_id AS post_id3_1_0__ FROM   post p LEFT OUTER JOIN        post_comment pc ON p.id=pc.post_id WHERE        p.title='High-Performance Java Persistence eBook has been released!'   -- Fetched the following Post entity identifiers: [1]

By passing DISTINCT to the SQL query, the EXECUTION PLAN is going to execute an extra Sort phase which adds overhead without bringing any value since the parent-child combinations always return unique records because of the child PK column:

Unique  (cost=23.71..23.72 rows=1 width=1068) (actual time=0.131..0.132 rows=2 loops=1)   ->  Sort  (cost=23.71..23.71 rows=1 width=1068) (actual time=0.131..0.131 rows=2 loops=1)         Sort Key: p.id, pc.id, p.created_on, pc.post_id, pc.review         Sort Method: quicksort  Memory: 25kB         ->  Hash Right Join  (cost=11.76..23.70 rows=1 width=1068) (actual time=0.054..0.058 rows=2 loops=1)               Hash Cond: (pc.post_id = p.id)               ->  Seq Scan on post_comment pc  (cost=0.00..11.40 rows=140 width=532) (actual time=0.010..0.010 rows=2 loops=1)               ->  Hash  (cost=11.75..11.75 rows=1 width=528) (actual time=0.027..0.027 rows=1 loops=1)                     Buckets: 1024  Batches: 1  Memory Usage: 9kB                     ->  Seq Scan on post p  (cost=0.00..11.75 rows=1 width=528) (actual time=0.017..0.018 rows=1 loops=1)                           Filter: ((title)::text = 'High-Performance Java Persistence eBook has been released!'::text)                           Rows Removed by Filter: 3 Planning time: 0.227 ms Execution time: 0.179 ms

Entity queries with HINT_PASS_DISTINCT_THROUGH

To eliminate the Sort phase from the execution plan, we need to use the HINT_PASS_DISTINCT_THROUGH JPA query hint:

List<Post> posts = entityManager .createQuery(     "select distinct p " +     "from Post p " +     "left join fetch p.comments " +     "where p.title = :title", Post.class) .setParameter(     "title",      "High-Performance Java Persistence eBook has been released!" ) .setHint(QueryHints.HINT_PASS_DISTINCT_THROUGH, false) .getResultList();   LOGGER.info(     "Fetched the following Post entity identifiers: {}",      posts.stream().map(Post::getId).collect(Collectors.toList()) );

And now, the SQL query will not contain DISTINCT but Post entity reference duplicates are going to be removed:

SELECT        p.id AS id1_0_0_,        pc.id AS id1_1_1_,        p.created_on AS created_2_0_0_,        p.title AS title3_0_0_,        pc.post_id AS post_id3_1_1_,        pc.review AS review2_1_1_,        pc.post_id AS post_id3_1_0__ FROM   post p LEFT OUTER JOIN        post_comment pc ON p.id=pc.post_id WHERE        p.title='High-Performance Java Persistence eBook has been released!'   -- Fetched the following Post entity identifiers: [1]

And the Execution Plan is going to confirm that we no longer have an extra Sort phase this time:

Hash Right Join  (cost=11.76..23.70 rows=1 width=1068) (actual time=0.066..0.069 rows=2 loops=1)   Hash Cond: (pc.post_id = p.id)   ->  Seq Scan on post_comment pc  (cost=0.00..11.40 rows=140 width=532) (actual time=0.011..0.011 rows=2 loops=1)   ->  Hash  (cost=11.75..11.75 rows=1 width=528) (actual time=0.041..0.041 rows=1 loops=1)         Buckets: 1024  Batches: 1  Memory Usage: 9kB         ->  Seq Scan on post p  (cost=0.00..11.75 rows=1 width=528) (actual time=0.036..0.037 rows=1 loops=1)               Filter: ((title)::text = 'High-Performance Java Persistence eBook has been released!'::text)               Rows Removed by Filter: 3 Planning time: 1.184 ms Execution time: 0.160 ms

answered Sep 22 '22 12:09

Vlad Mihalcea

You are close.

select DISTINCT(c.name) from Customer c

answered Sep 18 '22 12:09

agelbess

Related questions
                            
                                Java Error: "Your security settings have blocked a local application from running"
                            
                                Which Java profiler is better: JProfiler or YourKit? [closed]
                            
                                JCheckbox - ActionListener and ItemListener?
                            
                                android:exported needs to be explicitly specified for <activity>. Apps targeting Android 12 and higher are required to specify
                            
                                Spring Boot - Limit on number of connections created
                            
                                Is it possible for a thread to Deadlock itself?
                            
                                Why Java and Python garbage collection methods are different?
                            
                                Performance of traditional for loop vs Iterator/foreach in Java
                            
                                Convert Observable<List<Car>> to a sequence of Observable<Car> in RxJava
                            
                                JPA Hibernate One-to-One relationship
                            
                                specify pom.xml in mvn command and mix goals of other project
                            
                                How return error message in spring mvc @Controller
                            
                                Why does the compiler prefer an int overload to a varargs char overload for a char?
                            
                                Overriding private methods in Java
                            
                                In Java how does one turn a String into a char or a char into a String?
                            
                                How to get database url from java.sql.Connection?
                            
                                How to give environmental variable path for file appender in configuration file in log4j
                            
                                keytool error bash: keytool: command not found
                            
                                Java ElasticSearch None of the configured nodes are available
                            
                                403 Forbidden with Java but not web browser?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With