Mongodb _id
field is defined as:
ObjectId is a 12-byte BSON type, constructed using:
a 4-byte value representing the seconds since the Unix epoch,
a 3-byte machine identifier,
a 2-byte process id, and
a 3-byte counter, starting with a random value.
what would be most efficient representation of this field in postgresql?
The _id is the default key that is generated to uniquely identify each document in the collection. The 12-byte ObjectId, which is a hexadecimal string value consists of: a 4-byte value representing the seconds since the Unix epoch, a 3-byte machine identifier, a 2-byte process id, and.
The default unique identifier generated as the primary key ( _id ) for a MongoDB document is an ObjectId. This is a 12 byte binary value which is often represented as a 24 character hex string, and one of the standard field types supported by the MongoDB BSON specification.
MongoDB provides an automatic unique identifier for the _id field in the form of an ObjectId data type. For those that are familiar with MongoDB Documents you've likely come across the ObjectId data type in the _id field.
MongoDB uses ObjectIds as the default value of _id field of each document, which is generated during the creation of any document. Object ID is treated as the primary key within any MongoDB collection. It is a unique identifier for each document or record.
I've used char(24)
with a constraint CHECK decode(mongo_id::text, 'hex'::text) > '\x30'::bytea
. While this constraint doesn't check the sanity of the ObjectId, it allows only valid format to be stored. This stores the ObjectId in plain text, which keeps the values easily readable.
Other option could be to use bytea
type for the column, and input the data as "\xOBJECT_ID"
where \x
transforms text form of OBJECT_ID to a byte array. This consumes less space than char(24)
(might be relevant if you have millions of rows), but accessing the values in a non-binary format requires using eg. encode(mongo_id::bytea, 'hex')
(might be burdensome).
Also some platforms such as RedShift might have problems with the bytea
data type.
If you need an easy access to the metadata in the ObjectId, you could parse and store it separately (eg. in a jsonb
column or a separate column for each relevant attribute). Possibly the "created at" part of the metadata is the only interesting attribute.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With