Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fulltext search combined with fuzzysearch in PostgreSQL

I want to realize a fulltext search in postgresql combined with a kind of fuzzy search. For my testarea I followed up this article: https://blog.lateral.io/2015/05/full-text-search-in-milliseconds-with-postgresql/ and everything is working fine. But sometimes I have search cases with and without spaces in the searach string like this:

In my 'title'-column there is an entry like 'test123'. My searchstring looks like 'test 123' with a space in it. How can I get a hit in this testcase?

My search-sql-query looks like:

SELECT * 
FROM test, plainto_tsquery('test:*&123:*') as q 
WHERE (tsv @@ q)

result: 0 rows

So I tried to figure out if I can use pg_trgm combined with ts_vector but I can not find a solution. Do you have an idea?

like image 229
Chris Avatar asked Sep 08 '17 17:09

Chris


People also ask

How do I use full-text search in PostgreSQL?

When performing a full-text searches on a PostgreSQL database, you must implement the to_tsvector() and to_tsquery() functions together with a match operator @@. The match operator returns a boolean value (either t or f) when you run a keyword search (tsquery) against a tsvector document.

What is fuzzy search in PostgreSQL?

A fuzzy word leads to partial or unclear. This search deals with the result that does not see the accurate or perfect search. The fuzzy search of data in PostgreSQL contains different approaches. Either through extensions or algorithms. But the phenomenon is applied in both ways.

Is PostgreSQL good for full-text search?

Yes, You Can Keep Full-Text Search in Postgres You can get even deeper and make your Postgres full-text search even more robust, by implementing features such as highlighting results, or writing your own custom dictionaries or functions.

Is Elasticsearch faster than Postgres?

Elasticsearch is faster than Postgres when it comes to searching for data. Elasticsearch is a powerful search engine that is often faster than Postgres when it comes to searching for data. Elasticsearch can be used to search for documents, images, and other data stored in a database.


1 Answers

As the documentation on parsing states:

...plainto_tsquery will not recognize tsquery operators, weight labels, or prefix-match labels in its input...

plainto_tsquery and phraseto_tsquery are convenience functions which make it easier to search by a full string, but they don't support all of the features. Use to_tsquery instead which accepts the full search syntax:

SELECT * 
FROM test, to_tsquery('testing:* & 123:*') as q 
WHERE (tsv @@ q)

This function also requires you to normalize the search query in the same way you normalize the text you're searching by using to_tsvector, but that's pretty easy with some string functions:

SELECT string_agg(lexeme || ':*', ' & ' order by positions) 
FROM unnest(to_tsvector('testing 123'))

This basically gets the individual tokens from to_tsvector, appends :* to each, then joins them with & to create a single string. The example above takes testing 123 and produces testing:* & 123:* which you can then use directly with to_tsquery to get fuzzy matching with the normalization intact.

You can combine it all together into a CTE to make it simple:

WITH search AS (
    SELECT to_tsquery(string_agg(lexeme || ':*', ' & ' order by positions)) AS query
    FROM unnest(to_tsvector('enter your search query here'))
)
SELECT test.*
FROM test, query
WHERE (test.tsv @@ search.query)

This assumes that the table has a tsv column of datatype tsquery which is pregenerated instead of creating it on every query (which is much slower). PG12+ supports generated columns which can keep this updated automatically.

like image 192
Mani Gandham Avatar answered Sep 22 '22 14:09

Mani Gandham