Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Database: Insert new rows or update existing ones?

Object-oriented design encourages the use of immutable objects to improve thread-safety and performance. I'm wondering whether this carries over to relational databases.

Am I better off updating existing rows, or inserting new ones that act as overrides?

  • Use-case
    • Each employee is associated with exactly one company
    • Employees change their company over time.
    • The employee name should be unique.
  • Schema
    • Employee[name, company]

Option 1: Every time an employee changes companies, insert a new Employee[name, company] row. The application is instructed to skip older rows (which are pruned up in a background thread over time). Option 2: Every time an employee changes companies, update the existing row.

Option 1 reminds me of immutable objects in that it's thread-safe (no need for locks). On the other hand, every time the employee changes companies, I have to clone all associated objects and point them at the new record. Furthermore, it's not clear how to prevent duplicate Employees from getting created by mistake.

Option 2 makes it easy to prevent duplicate employees but has the downside of returning potentially inconsistent associations in READ_COMMITTED transaction isolation.

like image 412
Gili Avatar asked Feb 12 '11 22:02

Gili


People also ask

Is update better than delete and insert?

For best future query performance, it's better to do an update to keep the same extents. Delete and insert will not necessarily use the same extents. For a table of that size, it would be unlikely to do so. Furthermore, delete can leave "holes" in your data.

What is the difference between insert and update?

The main difference between INSERT and UPDATE in SQL is that INSERT is used to add new records to the table while UPDATE is used to modify the existing records in the table. Relational Database Management System (RDBMS) is a program that allows storing and managing relational databases.

Which is faster insert or update?

Is update or INSERT faster? Inserts will just about always be quicker, especially if they are either in order or if the underlying table doesn't have a clustered index.

How do you insert or update a row in SQL?

INSERT OR UPDATE table SET column = value, column2 = value2, ... inserts or updates a row of values by explicitly setting the values of specific columns. This statement performs the same operation as in the INSERT OR UPDATE table (column, column2, ...) VALUES (value, value2, ...) syntax.


3 Answers

I am posting this in the hopes that this helps others in the future. I have personally wasted countless days going down this (wrong) path.

Immutable Objects are meant for value-types (think Integer, a timestamp, a temperature reading, etc). They are types that will never change. The moment you start talking about modifying the values of Immutable Objects it is a pretty strong indication that you are going down the wrong path. When you use genuine Immutable Objects you should never have to update references of associated objects.

So the correct answer, whether for object-oriented programming or database design, is to update mutable objects in-place.

UPDATE: marc_s mentions the fact that some systems require an immutable audit trail. I suggest splitting the system into two. The main table updates data in-place while inserting copies into a separate audit table. This has two advantages:

  1. The main table can take advantage of integrity checks (i.e. "employee names must be unique").
  2. The main table remains very fast for reads and the bigger/slower audit table can be trimmed over time.

This allows you to enjoy the best of both worlds.

like image 54
Gili Avatar answered Sep 26 '22 12:09

Gili


Those aren't options. They're completely different things, and they require completely different tables. The painful part is that the data from the tables might look exactly the same. Here's how to tell them apart.

Each table in a relational database has one and only one predicate. The predicate determines what the rows in the table mean. So a table whose data looks like this

Name    Company
--
Gili    Microsoft
Marc    Oracle

might mean

Person named "Gili" is currently an employee of company "Microsoft".
Person named "Marc" is currently an employee of company "Oracle".

Such a table would exclude consultants, because they're not employees. (In the USA they're not, anyway.)

But it might mean

Person named "Gili" once was an employee of company "Microsoft".
Person named "Marc" once was an employee of company "Oracle".

and that table would also allow

Person named "Gili" once was an employee of company "Oracle".

Different predicates, different tables. (That means you have to construct the tables differently to capture the meaning of the predicate.)

What you can't have is a table that means

Person named "Gili" is currently an employee of company "Microsoft".
Person named "Marc" once was an employee of company "Oracle".

Two different predicates in one table. Can't do it in a relational system.

So, if your predicate boils down to this

Person named NAME is currently an employee of company COMPANY.

then you must update COMPANY when the person changes employers. And if your predicate boils down to this

Person named NAME once was an employee of company COMPANY.

then you must insert a new row when the person changes employers.

like image 26
Mike Sherrill 'Cat Recall' Avatar answered Sep 24 '22 12:09

Mike Sherrill 'Cat Recall'


In general, Data Warehousing tends to follow the "insert only" pattern. The reason is that obsolete rows in diomension tables are still needed to place old facts in the context that existed when they were new facts.

Example: Pennsylvania was part of the Northeast sales region until January 1, when it became part of the Middle Atlantic Sales Region. A sale that was made last December needs to refer back to a row in the geographic dimension table that places it in the northeast region. An update in place of the "State" dimension table would invalidate this old fact.

The tendency in OLTP databases is to perform updates in place, and only keep track of what currently is the case. However, this may result in copying some data into transaction rows. For example, The invoice detail row in a purchase order system may contain the price of the item ordered, copied from the row in the products table. That way, if the price gets updated in the products table, the price affecting this invoice doesn't get clobbered.

like image 25
Walter Mitty Avatar answered Sep 22 '22 12:09

Walter Mitty