Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I SQL query for words with punctuation in Postgresql?

If I have strings/phrases like this stored in the database:

  • What are Q-type Operations?
  • Programmer's Guide
  • A.B.C's of Coding

Is there a way to pass a query parameter in like "Programmers" or "abc" or "q-type" and have it find "Programmer's", "A.B.C" and "Q-type"?

like image 699
Lance Avatar asked Mar 18 '11 15:03

Lance


2 Answers

tsvector

Use the tsvector type, which is part of the PostgreSQL text-search feature.

postgres> select 'What are Q-type Operations?'::tsvector;
              tsvector               
-------------------------------------
 'Operations?' 'Q-type' 'What' 'are'
(1 row)

You can use familiar operators on tsvectors as well:

postgres> select 'What are Q-type Operations?'::tsvector
postgres>        || 'A.B.C''s of Coding'::tsvector;
                           ?column?                           
--------------------------------------------------------------
 'A.B.C''s' 'Coding' 'Operations?' 'Q-type' 'What' 'are' 'of'

From tsvector documentation:

A tsvector value is a sorted list of distinct lexemes, which are words that have been normalized to merge different variants of the same word (see Chapter 12 for details). Sorting and duplicate-elimination are done automatically during input

If you also want to do language-specific normalization, like removing common words ('the', 'a', etc) and multiplies, use the to_tsvector function. It also assigns weights to different words for text search:

postgres> select to_tsvector('english',
postgres> 'What are Q-type Operations? A.B.C''s of Coding');
                      to_tsvector                       
--------------------------------------------------------
 'a.b.c':7 'code':10 'oper':6 'q':4 'q-type':3 'type':5
(1 row)

Full-blown text search

Obviously doing this for every row in a query will be expensive -- so you should store the tsvector in a separate column and use ts_query() to search for it. This also allows you to create a GiST index on the tsvector.

postgres> insert into text (phrase, tsvec)
postgres>   values('What are Q-type Operations?',
postgres>   to_tsvector('english', 'What are Q-type Operations?'));
INSERT 0 1

Searching is done using tsquery and the @@ operator:

postgres> select phrase from text where tsvec @@ to_tsquery('q-type');
           phrase            
-----------------------------
 What are Q-type Operations?
(1 row)
like image 137
intgr Avatar answered Oct 11 '22 08:10

intgr


You could try with an ILIKE with a TRANSLATE function, see here.

For example: translate(field, '.-\'', '')

like image 38
Joril Avatar answered Oct 11 '22 10:10

Joril