I need to use a full text search to get the rows where the column contains the word O'henry
My code:
$word = "O'henry";
$sql = "SELECT * FROM mytable WHERE to_tsvector('english', col) @@ to_tsquery(:word) ";
$sth = $db->prepare( $sql );
$sth->execute(
array(":word"=>$word)
);
I have two problems:
1) This query matches columns which contains the word O'henry
, but it also matches columns containing, for example: "O yes, thierry henry is good sportsman."
2) If $word
begins with quotes, for example 'henry
, I got an error: syntax error in tsquery: "'henry"
, in spite of the fact that the search string was already escaped.
How can I solve this problem?
The default english dictionary is going to tokenize your data treating the '
as a it would a space. You can use ts_debug
to inspect how PostgreSQL/tsearch will treat your text:
psql=# SELECT * FROM ts_debug('english','o''henry');
alias | description | token | dictionaries | dictionary | lexemes
-----------+-----------------+-------+----------------+--------------+---------
asciiword | Word, all ASCII | o | {english_stem} | english_stem | {o}
blank | Space symbols | ' | {} | |
asciiword | Word, all ASCII | henry | {english_stem} | english_stem | {henri}
(3 rows)
It's clear from this output that if you want to have postgres treat o'henry
as a single word, you are going to have to either:
tsvector
and tsquery
before use.I'd say the second option is by far the easiest:
$sql = "SELECT * FROM mytable WHERE plainto_tsvector('english', replace(col, '''','')) @@ to_tsquery(replace(:word,'''',''));"
After reading through the documentation, I believe that this is a limitation within the types of strings accepted by to_tsquery
. According to this page:
the input to to_tsquery must already follow the general rules for tsquery input
The tsquery input rules are defined here. Based on the examples given, the engine processes each input word into single-quoted strings, which will break on your input. As a workaround, I found a suggestion in this forum post (where the user was trying to properly escape the input 'ksan
:
select *
from items
where to_tsvector(name) @@ to_tsquery(E'[\']ksan')
To apply this to your situation, you would need to pre-format your input using something like this:
$word = 'E' . str_replace("'", "[\']", $word);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With