Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance problem on Java DB Derby Blobs & Delete

I’ve been experiencing a performance problem with deleting blobs in derby, and was wondering if anyone could offer any advice.

This is primarily with 10.4.2.0 under windows and solaris, although I’ve also tested with the new 10.5.1.1 release candidate (as it has many lob changes), but this makes no significant difference.

The problem is that with a table containing many large blobs, deleting a single row can take a long time (often over a minute).

I’ve reproduced this with a small test that creates a table, inserts a few rows with blobs of differing sizes, then deletes them.

The table schema is simple, just:

create table blobtest( id integer generated BY DEFAULT as identity, b blob )

and I’ve then created 7 rows with the following blob sizes : 1024 bytes, 1Mb, 10Mb, 25Mb, 50Mb, 75Mb, 100Mb.

I’ve read the blobs back, to check they have been created properly and are the correct size.

They have then been deleted using the sql statement ( “delete from blobtest where id = X” ).

If I delete the rows in the order I created them, average timings to delete a single row are:

1024 bytes: 19.5 seconds

1Mb: 16 seconds

10Mb: 18 seconds

25Mb: 15 seconds

50Mb: 17 seconds

75Mb: 10 seconds

100Mb: 1.5 seconds

If I delete them in reverse order, the average timings to delete a single row are:

100Mb: 20 seconds

75Mb: 10 seconds

50Mb: 4 seconds

25Mb: 0.3 seconds

10Mb: 0.25 seconds

1Mb: 0.02 seconds

1024 bytes: 0.005 seconds

If I create seven small blobs, delete times are all instantaneous.

It thus appears that the delete time seems to be related to the overall size of the rows in the table more than the size of the blob being removed.

I’ve run the tests a few times, and the results seem reproducible.

So, does anyone have any explanation for the performance, and any suggestions on how to work around it or fix it? It does make using large blobs quite problematic in a production environment…

like image 580
ariso Avatar asked May 21 '09 15:05

ariso


3 Answers

I have exact the same issue you have.

I found that when I do DELETE, derby actually "read through" the large segment file completely. I use Filemon.exe to observe how it run.

My file size it 940MB, and it takes 90s to delete just a single row.

I believe that derby store the table data in a single file inside. And some how a design/implementation bug that cause it read everything rather then do it with a proper index.

I do batch delete rather to workaround this problem. I rewrite a part of my program. It was "where id=?" in auto-commit. Then I rewrite many thing and it now "where ID IN(?,.......?)" enclosed in a transaction.

The total time reduce to 1/1000 then it before.

I suggest that you may add a column for "mark as deleted", with a schedule that do batch actual deletion.

like image 170
Dennis C Avatar answered Nov 08 '22 19:11

Dennis C


As far as I can tell, Derby will only store BLOBs inline with the other database data, so you end up with the BLOB split up over a ton of separate DB page files. This BLOB storage mechanism is good for ACID, and good for smaller BLOBs (say, image thumbnails), but breaks down with larger objects. According to the Derby docs, turning autocommit off when manipulating BLOBs may also improve performance, but this will only go so far.

I strongly suggest you migrate to H2 or another DBMS if good performance on large BLOBs is important, and the BLOBs must stay within the DB. You can use the SQuirrel SQL client and its DBCopy plugin to directly migrate between DBMSes (you just need to point it to the Derby/JavaDB JDBC driver and the H2 driver). I'd be glad to help with this part, since I just did it myself, and haven't been happier.

Failing this, you can move the BLOBs out of the database and into the filesystem. To do this, you would replace the BLOB column in the database with a BLOB size (if desired) and location (a URI or platform-dependent file string). When creating a new blob, you create a corresponding file in the filesystem. The location could be based off of a given directory, with the primary key appended. For example, your DB is in "DBFolder/DBName" and your blobs go in "DBFolder/DBName/Blob" and have filename "BLOB_PRIMARYKEY.bin" or somesuch. To edit or read the BLOBs, you query the DB for the location, and then do read/write to the file directly. Then you log the new file size to the DB if it changed.

like image 38
BobMcGee Avatar answered Nov 08 '22 19:11

BobMcGee


I'm sure this isn't the answer you want, but for a production environment with throughput requirements I wouldn't use Java DB. MySQL is just as free and will handle your requirements a lot better. I think you are really just beating your head against a limitation of the solution you've chosen.

I generally only use Derby as a test case, and especially only when my entire DB can fit easily into memory. YMMV.

like image 1
Gandalf Avatar answered Nov 08 '22 19:11

Gandalf