Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding Prefix Match to pg_search

I am following this Railscasts episode.

If I search "Kerber" it does return the right article. But if I search "Ke" it does not returns the same article.

There's a way to fix this?

class Item < ActiveRecord::Base
  include PgSearch
  pg_search_scope :search, against: [:description, :about, :link, :twitterhandle, :builtby],
  using: {tsearch: {dictionary: "english"}}

  def self.text_search(query)
    if query.present?
      search(query)
    else
      scoped
    end
  end
like image 610
Sullivan Avatar asked Dec 03 '25 16:12

Sullivan


2 Answers

I'm the author and maintainer of pg_search.

You can add prefix: true to the configuration for the :tsearch search feature to have pg_search automatically add :* to the end of your queries.

https://github.com/Casecommons/pg_search#prefix-postgresql-84-and-newer-only

class Item < ActiveRecord::Base
  include PgSearch
  pg_search_scope :search, against: [:description, :about, :link, :twitterhandle, :builtby],
  using: {tsearch: {prefix: true, dictionary: "english"}}

  def self.text_search(query)
    if query.present?
      search(query)
    else
      scoped
    end
  end
end
like image 173
Grant Hutchins Avatar answered Dec 06 '25 06:12

Grant Hutchins


That result makes sense to me. Ke and Kerber are different words so they don't match in full-text search.

Full-text search only does stemming - removing plurals, etc - so that cats matches cat. Even this isn't exactly smart - atypical plurals like dice aren't handled. It also only works for words recognised in the target language dictionary, so even if Kerber was the plural of Ke it wouldn't be stemmed when the language is set to english.

See the tsquery and tsvectors:

regress=> SELECT to_tsvector('Kerber'), to_tsquery('Kerber'), to_tsvector('ke'), to_tsquery('ke');
 to_tsvector | to_tsquery | to_tsvector | to_tsquery 
-------------+------------+-------------+------------
 'kerber':1  | 'kerber'   | 'ke':1      | 'ke'
(1 row)

and the matches:

regress=> SELECT to_tsvector('Kerber') @@ to_tsquery('Kerber'), to_tsvector('kerber') @@ to_tsquery('ke');
 ?column? | ?column? 
----------+----------
 t        | f                                                                                                                                                                  
(1 row)        

I suspect that you want a tsearch prefix match. This is expressed with a :* wildcard:

regress=> SELECT to_tsvector('kerber') @@ to_tsquery('ke:*');
 ?column? 
----------
 t
(1 row)

This only works for prefix matching. It can have an impact on search efficiency, but I don't think it's a major one.

like image 35
Craig Ringer Avatar answered Dec 06 '25 07:12

Craig Ringer



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!