Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgres to_tsvector VS ::tsvector

Tags:

postgresql

I don't really understand the difference between to_tsvector and ::tsvector in Postgres. I read the documentation on to_tsvectorhere, but there doesn't seem to be any documentation for the other, ::tsvector - that's a bit problematic. It is mentioned here, but it says something about normalizing before querying, and normalization is done through to_tsvector.

I created this SQL Fiddle to demonstrate the two; here is the code if you don't want to navigate away:

DDL:

CREATE TABLE text (
  text_id serial PRIMARY KEY,
  source_text text NOT NULL,
  destination_text text NOT NULL
);

and the SQL:

-- Throw some stuff in there
INSERT INTO text (source_text, destination_text) VALUES
('Hello', 'Hello Result'),
('With Comma, Query', 'WithComma, Result');

-- Forced to use punctuation in the query to match what is in the vector
SELECT T.source_text, T.destination_text
FROM   text T
WHERE  LOWER(T.source_text)::tsvector @@ LOWER('Hello')::tsquery;

-- Vector free of punctuation, don't include it in the query
SELECT T.source_text, T.destination_text
FROM   text T
WHERE  to_tsvector(LOWER(T.source_text)) @@ LOWER('Comma')::tsquery;

SELECT ts_debug('english', 'Something without a comma');

SELECT ts_debug('english', 'Something, with a comma');

It seems to me that to_tsvector will take the text, strip it of punctuation and return the vector. On the other hand, ::tsvector seems to include the punctuation in the vector, resulting in a need to use the same punctuation in the query.

What are the practical differences between the two? Is one generally preferred over the other? In what situations is each preferred?

like image 747
Chris Cirefice Avatar asked Jun 04 '14 18:06

Chris Cirefice


People also ask

What is Tsvector in Postgres?

A tsvector value is a sorted list of distinct lexemes, which are words that have been normalized to merge different variants of the same word (see Chapter 12 for details).

What does :: means in PostgreSQL?

The type 'string' syntax is a generalization of the standard: SQL specifies this syntax only for a few data types, but PostgreSQL allows it for all types. The syntax with :: is historical PostgreSQL usage, as is the function-call syntax.

How do I do a full text search in PostgreSQL?

In PostgreSQL, you use two functions to perform Full Text Search. They are to_tsvector() and to_tsquery(). Let's see how they work and to use them first. to_tsvector() function breaks up the input string and creates tokens out of it, which is then used to perform Full Text Search using the to_tsquery() function.

Is PostgreSQL good for full text search?

Yes, You Can Keep Full-Text Search in Postgres You can get even deeper and make your Postgres full-text search even more robust, by implementing features such as highlighting results, or writing your own custom dictionaries or functions.


1 Answers

to_tsvector(text) reads the string and does some normalization (taking language settings into account) on the string.

::tsvector is a cast. It does not do any normalization (and does not care for language settings).

See: http://www.postgresql.org/docs/current/interactive/functions-textsearch.html

Some years ago a wrote an own to_tsvector() in python, since I was not happy with the way this was handled in postgres. This gave me more control.

To insert the data into the column I used the tsvector cast:

'UPDATE myapp_mymodel SET content=%s::tsvector where id=%s', [tsvector, self.id])
like image 117
guettli Avatar answered Sep 19 '22 13:09

guettli