Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is best representation for mongo _id field in postgresql?

Tags:

postgresql

Mongodb _id field is defined as:

ObjectId is a 12-byte BSON type, constructed using:

a 4-byte value representing the seconds since the Unix epoch,
a 3-byte machine identifier,
a 2-byte process id, and
a 3-byte counter, starting with a random value.

what would be most efficient representation of this field in postgresql?

like image 435
Ski Avatar asked Feb 01 '15 11:02

Ski


People also ask

What is datatype of _id?

The _id is the default key that is generated to uniquely identify each document in the collection. The 12-byte ObjectId, which is a hexadecimal string value consists of: a 4-byte value representing the seconds since the Unix epoch, a 3-byte machine identifier, a 2-byte process id, and.

What is the default size for _id field in MongoDB?

The default unique identifier generated as the primary key ( _id ) for a MongoDB document is an ObjectId. This is a 12 byte binary value which is often represented as a 24 character hex string, and one of the standard field types supported by the MongoDB BSON specification.

What is type of _id Mongo?

MongoDB provides an automatic unique identifier for the _id field in the form of an ObjectId data type. For those that are familiar with MongoDB Documents you've likely come across the ObjectId data type in the _id field.

How is _id generated in MongoDB?

MongoDB uses ObjectIds as the default value of _id field of each document, which is generated during the creation of any document. Object ID is treated as the primary key within any MongoDB collection. It is a unique identifier for each document or record.


1 Answers

I've used char(24) with a constraint CHECK decode(mongo_id::text, 'hex'::text) > '\x30'::bytea. While this constraint doesn't check the sanity of the ObjectId, it allows only valid format to be stored. This stores the ObjectId in plain text, which keeps the values easily readable.

Other option could be to use bytea type for the column, and input the data as "\xOBJECT_ID" where \x transforms text form of OBJECT_ID to a byte array. This consumes less space than char(24) (might be relevant if you have millions of rows), but accessing the values in a non-binary format requires using eg. encode(mongo_id::bytea, 'hex') (might be burdensome).

Also some platforms such as RedShift might have problems with the bytea data type.

If you need an easy access to the metadata in the ObjectId, you could parse and store it separately (eg. in a jsonb column or a separate column for each relevant attribute). Possibly the "created at" part of the metadata is the only interesting attribute.

like image 184
Petrus Repo Avatar answered Oct 31 '22 18:10

Petrus Repo