Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

inserting / updating a row based on its availability in database

This is a common scenario, but i wanted to find out which way is the performance optimized way and best practice.

I have a table with 4 columns: id, name, and two other fields. Id is the PK and name is a unique key. I'm reading data from excel file, populate the values of each row in a Domain object and then saving it. When saving, i want to see whether a record already exists for the same name and if exists, i want to update it. Else save it as a new record.

I can do it with normal select query for the name and check for null, and based on that insert or update but i have thousands of rows to be read from excel files and a non-functional requirement requested is the performance.

So please advice me on which is the best way to handle this senario? i haven't started coding my persistence layer part yet, so i can switch to an ORM or plain jdbc according to your suggestion.

Edited: If i use name as primary key, then i think i can use saveOrUpdate or merge from an ORM, to fullfill my need. Is it a good idea??? Thanks & regards, Prasath.

like image 814
popcoder Avatar asked Apr 11 '12 05:04

popcoder


1 Answers

I think the fastest way would be to carry out all the insert/updates in the database itself rather than connecting to it and using a large number of statements.

Note, this is Oracle specific, but other databases may have similar concepts.

I would use the following approach: First save the Excel data as a CSV file on the database server (/mydatadir/mydata.csv), then in Oracle I would be using an external table:

create or replace directory data_dir as '/mydatadir/';
create table external_table (
  id number(18),
  name varchar2(30),
  otherfield1 varchar2(40),
  otherfield2 varchar2(40))
organization external (
  type oracle_loader
  default directory data_dir
  access parameters
  ( fields terminated by ',' )
  location ('mydata.csv')
)

(Note, the external table wouldn't have to be set up every time)

Then you can use the following command to merge the data into your table:

merge into yourtable t
using external_table e
on t.name = e.name
when matched then
   update set t.id = e.id, 
              t.otherfield1 = e.otherfield1, 
              t.otherfield2 = t.otherfield2
when not matched then
   insert (t.id, t.name, t.otherfield1, t.otherfield2)
   values (e.id, e.name, e.otherfield1, e.otherfield2)

This will upsert the rows in yourtable in one Oracle command, so all the work will be carried out by the database.

EDIT:

This merge command can be issued over plain JDBC (though I prefer using Spring's SimpleJdbcTemplate)

EDIT2:

In MySQL you can use the following construct to perform the merge:

insert into yourtable (id, name, otherfield1, otherfield2)
values (?, ?, ?, ?), 
       (?, ?, ?, ?), 
       (?, ?, ?, ?) --repeat for each row in the Excel sheet...
on duplicate Key update
set otherfield1 = values(otherfield1),
    otherfield2 = values(otherfield2)

This can be issued as a plain JDBC statement and is going to be better than a separate update and insert, and you can call these in batches of (say) a hundred rows from the spreadsheet. This would mean 1 JDBC call for every 100 rows in your Excel sheet and should perform well. That'll allow you to do it without external tables (you'd need a UNIQUE index on the name column for this to work, I wouldn't change the primary key as this could cause you problems with foreign keys if you needed to change somebody's name).

MySQL also has the concept of external tables, which I think would be faster still than inserting the data as batches as per above. As long as the csv file is uploaded to the correct location, the import should work quickly.

like image 165
beny23 Avatar answered Sep 27 '22 21:09

beny23