We had a legacy system that attempted to keep track of all versions of data saved of a particular document. We originally stored our JSON as a string in some old version of Postgres, but recently we upgraded to Postgres 9.3 and we started using the JSON column type.
We had a column called "versions", and it held an an array, and each saved version of a particular document was stored in the array, so a query like this:
SELECT _data_as_json FROM measurements WHERE id = 3307551
returned JSON like this:
{"reports": {}, "versions": [
{"timestamp": "2014-04-28T19:12:31.567415", "user": 11327, "legacy": {}, "vd_version": 1},
{"timestamp": "2014-05-12T18:03:24.417029", "user": 11331, "legacy": {"lengthmoment": {"moment": {"size": 130}, "length": {"in": 64.0}}, "comments": "", "custom": null}, "vd_version": 1},
{"timestamp": "2014-05-12T21:52:50.045758", "user": 10373, "legacy": {"lengthmoment": {"moment": {"size": 130}, "length": {"in": 64.0}}, "comments": "", "custom": null}, "vd_version": 1},
{"timestamp": "2014-05-14T23:34:37.797822", "user": 10380, "legacy": {"lengthmoment": {"moment": {"size": 130}, "length": {"in": 64.0}}, "comments": "", "custom": null}, "vd_version": 1},
{"timestamp": "2014-07-16T14:56:38.667363", "user": 10374, "legacy": {"lengthmoment": {"moment": {"size": 130}, "length": {"in": 64.0}}, "comments": "", "custom": null}, "vd_version": 1},
{"timestamp": "2014-07-16T14:57:47.341541", "user": 10374, "legacy": {"lengthmoment": {"moment": {"size": 130}, "length": {"in": 64.0}}, "comments": "", "custom": null}, "vd_version": 1},
{"timestamp": "2014-07-17T16:32:09.067026", "user": 11331, "legacy": {"lengthmoment": {"moment": {"size": 130}, "length": {"in": 64.0}}, "comments": "", "custom": null}, "vd_version": 1},
{"timestamp": "2014-09-11T14:35:44.436886", "user": 11331, "legacy": {"lengthmoment": {"moment": {"size": 130}, "length": {"in": 64.0}}, "comments": "", "custom": null}, "vd_version": 1},
{"timestamp": "2014-10-15T14:30:50.554932", "user": 10383, "legacy": {"lengthmoment": {"moment": {"size": 130}, "length": {"in": 64.0}}, "comments": "", "custom": null}, "vd_version": 1},
{"timestamp": "2014-10-29T15:36:35.183787", "user": 11331, "legacy": {"lengthmoment": {"moment": {"size": 130}, "length": {"in": 64.0}}, "comments": "", "custom": null}, "vd_version": 1},
{"timestamp": "2014-11-12T22:22:03.892484", "user": 10373, "legacy": {"lengthmoment": {"moment": {"size": 130}, "length": {"in": 64.0}}, "comments": "", "custom": null}, "vd_version": 1}
]}
We (attempted to) stored the data in "versions" in chronological order, but 99% of the time, we only need the last document. In Postgres 9.3, we came up with this query to get the last item:
SELECT json_array_elements(_data_as_json->'versions')
FROM measurements
WHERE id = 3307551
LIMIT 1 OFFSET (SELECT json_array_length(_data_as_json->'versions') - 1 FROM measurements WHERE id = 3307551)
This basically works, but it is a little fragile. If we ever fail to order things correctly in the versions array, then we get back the wrong version of the document. I'm curious if there is a better way to do this? I have read that Postgres 9.4 offers more functions for dealing with JSON.
Ideally, we could do ORDER BY on the "timestamp". Is that possible?
The job is simple now since, quoting the manual:
The field/element/path extraction operators that accept integer JSON array subscripts all support negative subscripting from the end of arrays.
Bold emphasis mine. So for either json
or jsonb
:
SELECT data->'versions'->>-1
FROM measurements m
WHERE id = 3307551;
You may want to use jsonb
instead of json
. Use jsonb_array_elements()
or jsonb_array_length()
accordingly.
There is a general approach to get the last element according to original sort order using WITH ORDINALITY
(slightly slower without):
SELECT v.ver
FROM measurements m
, jsonb_array_elements(m.data->'versions') WITH ORDINALITY v(ver, ord)
WHERE m.id = 3307551
ORDER BY v.ord DESC
LIMIT 1;
Details for WITH ORDINALITY
(and the implicit JOIN LATERAL
in both versions):
"last" according to the timestamp value:
SELECT v.ver
FROM measurements m
, json_array_elements(m.data->'versions') v(ver)
WHERE m.id = 3307551
ORDER BY (v.ver->>'timestamp')::timestamp DESC
LIMIT 1;
"last" according to ordinal position in the json
array (faster):
SELECT data->'versions'->(json_array_length(data->'versions') - 1)
FROM measurements
WHERE id = 3307551;
We need - 1
because JSON arrays start at offset 0.
db<>fiddle here
Old sqlfiddle
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With