Does PostgreSQL use tf-idf?

Tags:

postgresql

I would like to know whether full text search in PostgreSQL 9.3 with GIN/GiST index uses tf-idf (term frequency-inverse document frequency).

In particular, in my columns of phrases, I have some words that are more popular, whereas some are quite unique (i.e., names). I want to index these columns so that the unique words matched will be weighted higher than common words.

594

asked Aug 18 '13 06:08

AdamNYC

2 Answers

No Postgres does not use TF-IDF as a similarity measure among documents.

ts_rank is higher if a document contains query terms more frequently. It does not take into account the global frequency of the term.

ts_rank_cd is higher if a document contains query terms closer together and more frequently. It does not take into account the global frequency of the term.

There is an extension from the text search creators called smlar, that lets you calculate the similarity between arrays using TF-IDF. It also lets you turn tsvectors into arrays, and supports fast indexing.

137

answered Oct 01 '22 05:10

Neil McGuigan

No. Within the ts_rank function, there is no native method to rank results using their global (corpus) frequency. The rank algorithm does however rank based on frequency within the document:

http://www.postgresql.org/docs/9.3/static/textsearch-controls.html

So if I search for "dog|chihuahua" the following two documents would have the same rank despite the relatively lower frequency of the word "chihuahua":

Click to copy

"I want a dog"
"I want a chihuahua"

However, the following line would get ranked higher than the previous two lines above, because it contains the stemmed token "dog" twice in the document:

Click to copy

"dog lovers have an average of 1.5 dogs"

In short: higher term frequency within the document results in a higher rank, but a lower term frequency in the corpus has no impact.

One caveat: the text search does ignore stop-words, so you will not match on ultra high frequency words like "the","a","of","for" etc (assuming you have correctly set your language)

answered Oct 01 '22 06:10

mgoldwasser

Related questions
                            
                                How to transfer Postgres Data from Query to S3 Efficiently
                            
                                Use for the phppgadmin Reports Database?
                            
                                Prepared transactions with Postgres 8.4.3 on CentOS
                            
                                Making Postgres SQL minimal size. How?
                            
                                How to implement Auditing/versioning of Table Modifications on PostgreSQL
                            
                                Why do some Django ORM queries end abruptly with the message "Killed"?
                            
                                what's the utility of array type?
                            
                                Lion update removed the 'postgres' user. How to restore it?
                            
                                How to get value of $1 parameter from executed prepared statement (inside a trigger using a current_query())
                            
                                PostgreSQL: order by sum of computed values
                            
                                Case insensitive like (ilike) in Datamapper with Postgresql
                            
                                Rails produces "PGError: server closed the connection unexpectedly" after some timeout
                            
                                postgresql - java - wake up application when something in database happens
                            
                                Using PostgreSQL, why doesn't Hibernate/JPA create cascade constraints?
                            
                                Is there any equivalent to Postgresql EVERY aggregate function on other RDBMS?
                            
                                Query where foreign key column can be NULL
                            
                                PlayFramework 2 + Ebean - raw Sql Update query - makes no effect on db
                            
                                query SQL Server from PostgreSQL
                            
                                Postgresql inbox query
                            
                                Postgresql - Create Database & Table dynamically

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With