I am using trigrams for full-text search in postgres like this:
SELECT *
FROM "Users" users
WHERE 'search_text' % ANY(regexp_split_to_array(users.name,E'\\s+'))
Query above checks if search_text
matches any word (split by whitespace
) in User.displayName
. This works however results are not ordered by actual "score".
Score can be calculated by similarity(text,text)
function.
The problem is that I have to order by them sum of all similarities found for each word in User.name
. So if user name is "A B C"
then its score should be:
similarity('search_text','A') + similarity('search_text','B') + similarity('search_text','C')
So I need to map user name words to scores and then sum (reduce) them. How I can do this in postgres?
In instead of splitting to an array split to a set then regroup summing the similarity
with users as (
select user_id, name
from (values (1, 'John Smith')) t(user_id, name)
)
select user_id, sum(similarity('smth', name_part))
from (
select user_id, regexp_split_to_table(name, E'\\s+') as name_part
from users
) users
where 'smth' % name_part
group by user_id
order by 2 desc
;
user_id | sum
---------+-------
1 | 0.375
I'm not sure you want to eliminate those name parts for which the similarity is lower than the threshold. If not just omit the where
clause
BTW, why do you need to split the name? Why not just compare the whole name to the search string?
with users as (
select user_id, name
from (values (1, 'John Smith')) t(user_id, name)
)
select similarity('jon smth', name), *
from users
order by 1 desc
;
similarity | user_id | name
------------+---------+------------
0.333333 | 1 | John Smith
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With