There are two tables:
Authorized Contacts (auth_contacts
):
(
userid varchar
contacts jsonb
)
contacts
contains an array of contacts with attributes {contact_id, type}
discussion
:
(
contact_id varchar
discussion_id varchar
discussion_details jsonb
)
The table auth_contacts
has at least 100k records making it non JSONB type is not appropriate according as it would double or triple the amount of records.
Sample data for auth_contacts
:
userid | contacts
'11111' | '{"contact": [{"type": "type_a", "contact_id": "1-A-12"}
, {"type": "type_b", "contact_id": "1-A-13"}]}'
discussion
table has 5 million odd records.
I want to join on discussion.contact_id
(relational column) with contact id which a json object inside array of json objects in auth_contacts.contacts
.
One very crude way is:
SELECT *
FROM discussion d
JOIN (SELECT userid, JSONB_OBJECT_KEYS(a.contacts) AS auth_contact
FROM auth_contacts a) AS contacts
ON (d.contact_id = contacts.auth_contact::text)
What this does is actually at runtime create (inner sql) userid vs contact id table (Which is what I was avoiding and hence went for JSONB data type This query for a user with large records takes 26 + seconds which is not all good. Tried a few other ways: PostgreSQL 9.4: Aggregate / Join table on JSON field id inside array
But there should be a cleaner and better way which would be as simple as
JOIN d.contact_id = contacts -> contact -> contact_id?
When I try this, it doesn't yield any results.
When searching the net this seems to be a pretty cumbersome task?
Querying the JSON documentPostgreSQL has two native operators -> and ->> to query JSON documents. The first operator -> returns a JSON object, while the operator ->> returns text. These operators work on both JSON as well as JSONB columns. There are additional operators available for JSONB columns.
Json processes input faster than jsonb as there is no conversion involved in this. Jsonb converts the JSON data into the binary form so it has slightly slower input due to the binary conversion overhead. There is no change in the Schema design while working with JSON.
In general, most applications should prefer to store JSON data as jsonb , unless there are quite specialized needs, such as legacy assumptions about ordering of object keys. RFC 7159 specifies that JSON strings should be encoded in UTF8.
Most applications should use JSONB for schemaless data. It stores parsed JSON in a binary format, so queries are efficient.
Your "crude way" doesn't actually work. Here is another crude way that does:
SELECT *
FROM auth_contacts a
, jsonb_to_recordset(a.contacts->'contact') AS c(contact_id text)
JOIN discussion d USING (contact_id);
As has been commented, you can also formulate a join condition with the contains operator @>
:
SELECT *
FROM auth_contacts a
JOIN discussion d ON a.contacts->'contact'
@> json_build_array(json_build_object('contact_id', d.contact_id))::jsonb
But rather use JSON creation functions than string concatenation. Looks cumbersome but will actually be very fast if supported with a functional jsonb_path_ops GIN index:
CREATE INDEX auth_contacts_contacts_gin_idx ON auth_contacts
USING gin ((contacts->'contact') jsonb_path_ops);
Details:
This is all fascinating to play with, but the problem here is the relational model. Your claim:
hence making it non JSONB type is not appropriate according as it would double or triple the amount of records.
is the opposite of what's right. It's nonsense to wrap IDs you need for joining tables into a JSON document type. Normalize your table with a many-to-many relationship and implement all IDs you are working with inside the DB as separate columns with appropriate data type. Basics:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With