Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgreSQL: BYTEA vs OID+Large Object?

I started an application with Hibernate 3.2 and PostgreSQL 8.4. I have some byte[] fields that were mapped as @Basic (= PG bytea) and others that got mapped as @Lob (=PG Large Object). Why the inconsistency? Because I was a Hibernate noob.

Now, those fields are max 4 Kb (but average is 2-3 kb). The PostgreSQL documentation mentioned that the LOs are good when the fields are big, but I didn't see what 'big' meant.

I have upgraded to PostgreSQL 9.0 with Hibernate 3.6 and I was stuck to change the annotation to @Type(type="org.hibernate.type.PrimitiveByteArrayBlobType"). This bug has brought forward a potential compatibility issue, and I eventually found out that Large Objects are a pain to deal with, compared to a normal field.

So I am thinking of changing all of it to bytea. But I am concerned that bytea fields are encoded in Hex, so there is some overhead in encoding and decoding, and this would hurt the performance.

Are there good benchmarks about the performance of both of these? Anybody has made the switch and saw a difference?

like image 666
malaverdiere Avatar asked Jan 10 '11 08:01

malaverdiere


People also ask

What is Bytea in PostgreSQL?

The bytea data type allows the storage of binary strings or what is typically thought of as “raw bytes”. Materialize supports both the typical formats for input and output: the hex format and the historical PostgreSQL escape format. The hex format is preferred.

What is large objects in Postgres?

In Postgres, Large Objects (also known as BLOBs) are used to hold data in the database that cannot be stored in a normal SQL table. They are stored in a separate table in a special format, and are referred to from your own tables by an OID value.

What is OID PostgreSQL?

Object identifiers (OIDs) are used internally by PostgreSQL as primary keys for various system tables. OIDs are not added to user-created tables, unless WITH OIDS is specified when the table is created, or the default_with_oids configuration variable is enabled.

How does Postgres store BLOBs?

Large Objects using BLOB/CLOB In Postgres, these data types are stored in a single system table called 'pg_largeobject' which has to be accessed via identifiers of data type OID which are stored with the table using BLOB/CLOB data.


2 Answers

tl;dr Use bytea unless you need "streaming."

bytea is a byte sequence and works like any other value.

Large object are split up into multiple rows. This allows you seek, read, and write large objects like an OS file. You can operate on them without loading the entire thing into memory at once.

However, large objects have downsides:

  1. The is only large object table per database.

  2. Large objects aren't automatically removed when the "owning" record is deleted. (Technically, a large object can be referenced by several records.) See the lo_manage function in the lo module.

  3. Since there is only one table, large object permissions have to be handled record by record.

  4. Streaming is difficult, and has less support by client drivers than simple bytea.

  5. It's part of the system schema, so you have limited to no control over options like partitioning and tablespaces.

In terms of capacity, there isn't a huge difference. bytea is limited to 1GB; large objects are limited to 2GB. If 1GB is too limiting, probably 2GB is as well.

I venture to guess that 93% of real-world uses of large objects would be better served by using bytea.

like image 200
Paul Draper Avatar answered Oct 11 '22 12:10

Paul Draper


Basically there are cases where each makes sense. bytea is simpler and generally preferred. The client libs give you the decoding so that's not an issue.

However LOBs have some neat features, such as an ability to seek within them and treat the LOB as a byte stream instead of a byte array.

"Big" means "Big enough you don't want to send it to the client all at once." Technically bytea is limited to 1GB compressed and a lob is limited to 2GB compressed, but really you hit the other limit first anyway. If it's big enough you don't want it directly in your result set and you don';t want to send it to the client all at once, use a LOB.

like image 27
Chris Travers Avatar answered Oct 11 '22 12:10

Chris Travers