Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hibernate thread-safe idempotent upsert without constraint exception handling?

I have some code that performs an UPSERT, also known as a Merge. I want to clean-up this code, specifically, I want to move away from exception handling, and reduce overall verbosity and sheer complexity of the code for such a simple operation. The requirement is to insert each item unless it already exists:

public void batchInsert(IncomingItem[] items) {
    try(Session session = sessionFactory.openSession()) {
        batchInsert(session, items);
    }
    catch(PersistenceException e) {
        if(e.getCause() instanceof ConstraintViolationException) {
            logger.warn("attempting to recover from constraint violation");
            DateTimeFormatter dbFormat = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss.SSS");
            items = Arrays.stream(items).filter(item -> {
                int n = db.queryForObject("select count(*) from rets where source = ? and systemid = ? and updtdate = ?::timestamp",
                        Integer.class,
                        item.getSource().name(), item.getSystemID(), 
                        dbFormat.format(item.getUpdtDateObj()));
                if(n != 0) {
                    logger.warn("REMOVED DUPLICATE: " +
                            item.getSource() + " " + item.getSystemID() + " " + item.getUpdtDate());
                    return false;
                }
                else {
                    return true; // keep
                }
            }).toArray(IncomingItem[]::new);
            try(Session session = sessionFactory.openSession()) {
                batchInsert(session, items);
            }
        }
    }
}

An initial search of SO is unsatisfactory:

  • Hibernate Idempotent Update - conceptually similar but much simpler scenario with no regard for multi-threading or multi-processing.
  • Can Hibernate work with MySQL's "ON DUPLICATE KEY UPDATE" syntax? much better, removes the race condition by pushing atomicity to the database using @SQLInsert annotation; unfortunately, this solution is too error-prone to use on wider tables, and maintenance-intensive in evolving applications.
  • How to mimic upsert behavior using Hibernate? very similar to the above question, with a similar answer
  • Hibernate + "ON DUPLICATE KEY" logic same as above, answer mentions merge() which is ok when single-threaded
  • Bulk insert or update with Hibernate? similar question but the chosen answer is off-the-rails, using stored procedures
  • Best way to prevent unique constraint violations with JPA again very naive, single-thread-oriented question and answers

In the question How to do ON DUPLICATE KEY UPDATE in Spring Data JPA? which was marked as a duplicate, I noticed this intriguing comment: enter image description here

That was a dead-end as I really don't understand the comment, despite it sounding like a clever solution, and mention of "actual same SQL statement".

Another promising approach is this: Hibernate and Spring modify query Before Submitting to DB

ON CONFLICT DO NOTHING / ON DUPLICATE KEY UPDATE

Both of the major open-source databases support a mechanism to push idempotency down to the database. The examples below use the PostgreSQL syntax, but can be easily adapted for MySQL.

By following the ideas in Hibernate and Spring modify query Before Submitting to DB, Hooking into Hibernate's query generation, and How I can configure StatementInspector in Hibernate?, I implemented:

import org.hibernate.resource.jdbc.spi.StatementInspector;

@SuppressWarnings("serial")
public class IdempotentInspector implements StatementInspector {

    @Override
    public String inspect(String sql) {
        if(sql.startsWith("insert into rets")) {
            sql += " ON CONFLICT DO NOTHING";
        }
        return sql;
    }

}

with property

        <prop key="hibernate.session_factory.statement_inspector">com.myapp.IdempotentInspector</prop>

Unfortunately this leads to the following error when a duplicate is encountered:

Caused by: org.springframework.orm.hibernate5.HibernateOptimisticLockingFailureException: Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1; nested exception is org.hibernate.StaleStateException: Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1

Which makes sense, if you think about what's going on under the covers: the ON CONFLICT DO NOTHING causes zero rows to be inserted, but one insert is expected.

Is there a solution that enables thread-safe exception-free concurrent idempotent inserts and doesn't require manually defining the entire SQL insert statement to be executed by Hibernate?

For what it's worth, I feel that the approaches that push the dupcheck down to the database are the path to a proper solution.

CLARIFICATION The IncomingItem objects consumed by the batchInsert method originate from a system where records are immutable. Under this special condition the ON CONFLICT DO NOTHING behaves the same as an UPSERT, notwithstanding possible loss of the Nth update.

like image 550
Alex R Avatar asked Jun 05 '19 02:06

Alex R


1 Answers

Short answer - Hibernate does not support it out of the box (as confirmed by a Hibernate guru in this blog post). Probably you could make it work to some extent in some scenarios with the mechanisms you already described, but just using native queries directly looks the most straightforward approach to me for this purpose.

Longer answer would be that it would be hard to support it considering all the aspects of Hibernate I guess, e.g.:

  • What to do with instances for which duplicates are found, as they are supposed to become managed after persisting? Merge them into persistence context?
  • What to do with associations that have already been persisted, which cascade operations to apply on them (persist/merge/something_new; or is it too late at that point to make that decision)?
  • Do the databases return enough info from upsert operations to cover all use cases (skipped rows; generated keys for not-skipped in batch insert modes, etc).
  • What about @Audit-ed entities, are they created or updated, if updated what has changed?
  • Or versioning and optimistic locking (by the definition you actually want exception in that case)?

Even if Hibernate supported it in some way, I'm not sure I'd be using that feature if there were too many caveats to watch out and take into consideration.

So, the rule of thumb I follow is:

  • For simple scenarios (which are most of the time): persist + retry. Retries in case of specific errors (by exception type or similar) can be globally configured with AOP-like approaches (annotations, custom interceptors and similar) depending on which frameworks you use in your project and it is a good practice anyway especially in distributed environments.
  • For complex scenarios and performance intensive operations (especially when it comes to batching, very complex queries and alike): Native queries to maximize utilization of specific database features.
like image 88
Dragan Bozanovic Avatar answered Nov 16 '22 00:11

Dragan Bozanovic