Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can you emulate a Solr "more like this query" with Postgresql full text search?

I'd like to emulate this type of Solr query:

http://wiki.apache.org/solr/MoreLikeThis

with PostgreSQL using its full text search facility.

Is there a way to do something like a "more like this" query with pure postgres?

like image 444
dan Avatar asked May 14 '12 15:05

dan


People also ask

How do I do a full text search in PostgreSQL?

In PostgreSQL, you use two functions to perform Full Text Search. They are to_tsvector() and to_tsquery(). Let's see how they work and to use them first. to_tsvector() function breaks up the input string and creates tokens out of it, which is then used to perform Full Text Search using the to_tsquery() function.

Is PostgreSQL good for full text search?

Yes, You Can Keep Full-Text Search in Postgres You can get even deeper and make your Postgres full-text search even more robust, by implementing features such as highlighting results, or writing your own custom dictionaries or functions.

Is Elasticsearch faster than Postgres?

No matter how well PostgreSQL does on its full-text searches, Elasticsearch is designed to search in enormous texts and documents(or records). And the more size you want to search in, the more Elasticsearch is better than PostgreSQL in performance.

What is Tsvector?

tsvector. A tsvector value is a sorted list of distinct lexemes, which are words that have been normalized to merge different variants of the same word (see Chapter 12 for details).


1 Answers

Not out of the box I am afraid. It might be possible to compare two tsvectors to determine if they are similar enough, or pull the top n similar tsvectors, but there is no out of the box functionality to do this. The good news is that since tsvectors support GIN indexing, the complicated part is done for you.

What I think you'd need to do is create a function in C which determines the intersection of two tsvectors. From there you could create a function which determines if they overlap and an operator which addresses this. From there it shouldn't be too hard to create a ranking based on largest overlap.

Of course I suspect that this will be easiest to do in a language like C but you could probably use other procedural languages as well if you need to.

The wonderful thing about PostgreSQL is that anything is possible. of course the downside is that when you move further from core functionality you get to do a lot of it yourself.

like image 93
Chris Travers Avatar answered Oct 10 '22 01:10

Chris Travers