I don't really understand the difference between to_tsvector
and ::tsvector
in Postgres. I read the documentation on to_tsvector
here, but there doesn't seem to be any documentation for the other, ::tsvector
- that's a bit problematic. It is mentioned here, but it says something about normalizing before querying, and normalization is done through to_tsvector
.
I created this SQL Fiddle to demonstrate the two; here is the code if you don't want to navigate away:
DDL:
CREATE TABLE text (
text_id serial PRIMARY KEY,
source_text text NOT NULL,
destination_text text NOT NULL
);
and the SQL:
-- Throw some stuff in there
INSERT INTO text (source_text, destination_text) VALUES
('Hello', 'Hello Result'),
('With Comma, Query', 'WithComma, Result');
-- Forced to use punctuation in the query to match what is in the vector
SELECT T.source_text, T.destination_text
FROM text T
WHERE LOWER(T.source_text)::tsvector @@ LOWER('Hello')::tsquery;
-- Vector free of punctuation, don't include it in the query
SELECT T.source_text, T.destination_text
FROM text T
WHERE to_tsvector(LOWER(T.source_text)) @@ LOWER('Comma')::tsquery;
SELECT ts_debug('english', 'Something without a comma');
SELECT ts_debug('english', 'Something, with a comma');
It seems to me that to_tsvector
will take the text, strip it of punctuation and return the vector. On the other hand, ::tsvector
seems to include the punctuation in the vector, resulting in a need to use the same punctuation in the query.
What are the practical differences between the two? Is one generally preferred over the other? In what situations is each preferred?
A tsvector value is a sorted list of distinct lexemes, which are words that have been normalized to merge different variants of the same word (see Chapter 12 for details).
The type 'string' syntax is a generalization of the standard: SQL specifies this syntax only for a few data types, but PostgreSQL allows it for all types. The syntax with :: is historical PostgreSQL usage, as is the function-call syntax.
In PostgreSQL, you use two functions to perform Full Text Search. They are to_tsvector() and to_tsquery(). Let's see how they work and to use them first. to_tsvector() function breaks up the input string and creates tokens out of it, which is then used to perform Full Text Search using the to_tsquery() function.
Yes, You Can Keep Full-Text Search in Postgres You can get even deeper and make your Postgres full-text search even more robust, by implementing features such as highlighting results, or writing your own custom dictionaries or functions.
to_tsvector(text)
reads the string and does some normalization (taking language settings into account) on the string.
::tsvector
is a cast. It does not do any normalization (and does not care for language settings).
See: http://www.postgresql.org/docs/current/interactive/functions-textsearch.html
Some years ago a wrote an own to_tsvector() in python, since I was not happy with the way this was handled in postgres. This gave me more control.
To insert the data into the column I used the tsvector cast:
'UPDATE myapp_mymodel SET content=%s::tsvector where id=%s', [tsvector, self.id])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With