What is best representation for mongo _id field in postgresql?

Tags:

postgresql

Mongodb _id field is defined as:

ObjectId is a 12-byte BSON type, constructed using:

a 4-byte value representing the seconds since the Unix epoch,
a 3-byte machine identifier,
a 2-byte process id, and
a 3-byte counter, starting with a random value.

what would be most efficient representation of this field in postgresql?

435

asked Feb 01 '15 11:02

1 Answers

I've used char(24) with a constraint CHECK decode(mongo_id::text, 'hex'::text) > '\x30'::bytea. While this constraint doesn't check the sanity of the ObjectId, it allows only valid format to be stored. This stores the ObjectId in plain text, which keeps the values easily readable.

Other option could be to use bytea type for the column, and input the data as "\xOBJECT_ID" where \x transforms text form of OBJECT_ID to a byte array. This consumes less space than char(24) (might be relevant if you have millions of rows), but accessing the values in a non-binary format requires using eg. encode(mongo_id::bytea, 'hex') (might be burdensome).

Also some platforms such as RedShift might have problems with the bytea data type.

If you need an easy access to the metadata in the ObjectId, you could parse and store it separately (eg. in a jsonb column or a separate column for each relevant attribute). Possibly the "created at" part of the metadata is the only interesting attribute.

184

answered Oct 31 '22 18:10

Petrus Repo

Related questions
                            
                                psycopg2 "TypeError: not all arguments converted during string formatting"
                            
                                PostgreSQL join using LIKE/ILIKE
                            
                                Postgresql recursive self join
                            
                                Query Postgres table by Block Range Index (BRIN) identifier directly
                            
                                Comparing two postgres dump files
                            
                                Spark streaming multiple sources, reload dataframe
                            
                                Timeout on advisory locks in postgresql
                            
                                How to implement an append-only versioned model in SQLAlchemy
                            
                                Using Ansible postgresql_user with psycopg2 from VirtualEnv
                            
                                Can two "SELECT FOR UPDATE" statements on the same table cause a deadlock?
                            
                                PostgreSQL ltree- vs tree module vs integer/string arrays or string delimited path
                            
                                Atomic multi-row update with a unique constraint
                            
                                Bulk upsert with SQLAlchemy [duplicate]
                            
                                Can not persist data model's field into database, but can retrieve it
                            
                                PostgreSQL tsvector configuration: how to allow special characters?
                            
                                How to store files in djangos array field
                            
                                postgres constantly gives "schema does not exist" errors [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With