Using JSON arrays in a jsonb
column in Postgres 9.4 and Rails, I can set up a scope that returns all rows containing any elements from an array passed to the scope method - like so:
scope :tagged, ->(tags) {
where(["data->'tags' ?| ARRAY[:tags]", { tags: tags }])
}
I'd also like to order the results based on the number of matched elements in the array.
I appreciate I might need to step outside the confines of ActiveRecord to do this, so a vanilla Postgres SQL answer is helpful too, but bonus points if it can be wrapped up in ActiveRecord so it can be a chain-able scope.
As requested, here's an example table. (Actual schema is far more complicated but this is all I'm concerned about.)
id | data
----+-----------------------------------
1 | {"tags": ["foo", "bar", "baz"]}
2 | {"tags": ["bish", "bash", "baz"]}
3 |
4 | {"tags": ["foo", "foo", "foo"]}
The use case is to find related content based on tags. More matching tags are more relevant, hence results should be ordered by the number of matches. In Ruby I'd have a simple method like this:
Page.tagged(['foo', 'bish', 'bash', 'baz']).all
Which should return the pages in the following order: 2, 1, 4
.
Your arrays contain only primitive values, nested documents would be more complicated.
Unnest the JSON arrays of found rows with jsonb_array_elements_text()
in a LATERAL
join and count matches:
SELECT *
FROM (
SELECT *
FROM tbl
WHERE data->'tags' ?| ARRAY['foo', 'bar']
) t
, LATERAL (
SELECT count(*) AS ct
FROM jsonb_array_elements_text(t.data->'tags') a(elem)
WHERE elem = ANY (ARRAY['foo', 'bar']) -- same array parameter
) ct
ORDER BY ct.ct DESC; -- more expressions to break ties?
Alternative with INSTERSECT
. It's one of the rare occasions that we can make use of this basic SQL feature:
SELECT *
FROM (
SELECT *
FROM tbl
WHERE data->'tags' ?| '{foo, bar}'::text[] -- alt. syntax w. array
) t
, LATERAL (
SELECT count(*) AS ct
FROM (
SELECT * FROM jsonb_array_elements_text(t.data->'tags')
INTERSECT ALL
SELECT * FROM unnest('{foo, bar}'::text[]) -- same array literal
) i
) ct
ORDER BY ct.ct DESC;
Note a subtle difference: This consumes each element when matched, so it does not count unmatched duplicates in data->'tags'
like the first variant does. For details see the demo below.
Also demonstrating an alternative way to pass the array parameter: as array literal: '{foo, bar}'
. This may be simpler to handle for some clients:
Or you could create a server side search function taking a VARIADIC
parameter and pass a variable number of plain text
values:
Related:
Be sure to have a functional GIN index to support the jsonb
existence operator ?|
:
CREATE INDEX tbl_dat_gin ON tbl USING gin (data->'tags');
Clarification as per request in the comment. Say, we have a JSON array with two duplicated tags (4 total):
jsonb '{"tags": ["foo", "bar", "foo", "bar"]}'
And search with an SQL array parameter including both tags, one of them duplicated (3 total):
'{foo, bar, foo}'::text[]
Consider the results of this demo:
SELECT * FROM (SELECT jsonb '{"tags":["foo", "bar", "foo", "bar"]}') t(data) , LATERAL ( SELECT count(*) AS ct FROM jsonb_array_elements_text(t.data->'tags') e WHERE e = ANY ('{foo, bar, foo}'::text[]) ) ct , LATERAL ( SELECT count(*) AS ct_intsct_all FROM ( SELECT * FROM jsonb_array_elements_text(t.data->'tags') INTERSECT ALL SELECT * FROM unnest('{foo, bar, foo}'::text[]) ) i ) ct_intsct_all , LATERAL ( SELECT count(DISTINCT e) AS ct_dist FROM jsonb_array_elements_text(t.data->'tags') e WHERE e = ANY ('{foo, bar, foo}'::text[]) ) ct_dist , LATERAL ( SELECT count(*) AS ct_intsct FROM ( SELECT * FROM jsonb_array_elements_text(t.data->'tags') INTERSECT SELECT * FROM unnest('{foo, bar, foo}'::text[]) ) i ) ct_intsct;
Result:
data | ct | ct_intsct_all | ct_dist | ct_intsct
-----------------------------------------+----+---------------+---------+----------
'{"tags": ["foo", "bar", "foo", "bar"]}' | 4 | 3 | 2 | 2
Comparing elements in the JSON array to elements in the array parameter:
ct
.ct_intsct_all
.ct_dist
or ct_intsct
.If you don't have dupes or if you don't care to exclude them, use one of the first two techniques. The other two are a bit slower (besides the different result), because they have to check for dupes.
I'm posting details of my solution in Ruby, in case it's useful to anyone tackling the same issue.
In the end I decided a scope isn't appropriate as the method will return the an array of objects (not a chainable ActiveRecord::Relation
), so I've written a class method and have provided a way to pass a chained scope to it through a block:
def self.with_any_tags(tags, &block)
composed_scope = (
block_given? ? yield : all
).where(["data->'tags' ?| ARRAY[:tags]", { tags: tags }])
t = Arel::Table.new('t', ActiveRecord::Base)
ct = Arel::Table.new('ct', ActiveRecord::Base)
arr_sql = Arel.sql "ARRAY[#{ tags.map { |t| Arel::Nodes::Quoted.new(t).to_sql }.join(', ') }]"
any_tags_func = Arel::Nodes::NamedFunction.new('ANY', [arr_sql])
lateral = ct
.project(Arel.sql('e').count(true).as('ct'))
.from(Arel.sql "jsonb_array_elements_text(t.data->'tags') e")
.where(Arel::Nodes::Equality.new Arel.sql('e'), any_tags_func)
query = t
.project(t[Arel.star])
.from(composed_scope.as('t'))
.join(Arel.sql ", LATERAL (#{ lateral.to_sql }) ct")
.order(ct[:ct].desc)
find_by_sql query.to_sql
end
This can be used like so:
Page.with_any_tags(['foo', 'bar'])
# SELECT "t".*
# FROM (
# SELECT "pages".* FROM "pages"
# WHERE data->'tags' ?| ARRAY['foo','bar']
# ) t,
# LATERAL (
# SELECT COUNT(DISTINCT e) AS ct
# FROM jsonb_array_elements_text(t.data->'tags') e
# WHERE e = ANY(ARRAY['foo', 'bar'])
# ) ct
# ORDER BY "ct"."ct" DESC
Page.with_any_tags(['foo', 'bar']) do
Page.published
end
# SELECT "t".*
# FROM (
# SELECT "pages".* FROM "pages"
# WHERE pages.published_at <= '2015-07-19 15:11:59.997134'
# AND pages.deleted_at IS NULL
# AND data->'tags' ?| ARRAY['foo','bar']
# ) t,
# LATERAL (
# SELECT COUNT(DISTINCT e) AS ct
# FROM jsonb_array_elements_text(t.data->'tags') e
# WHERE e = ANY(ARRAY['foo', 'bar'])
# ) ct
# ORDER BY "ct"."ct" DESC
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With