Following a question posted here about how I can increase the speed on one of my SQL Search methods, I was advised to update my table to make use of Full Text Search. This is what I have now done, using Gist indexes to make searching faster. On some of the "plain" queries I have noticed a marked increase which I am very happy about.
However, I am having difficulty in searching for partial words. For example I have several records that contain the word Squire (454) and I have several records that contain Squirrel (173). Now if I search for Squire it only returns the 454 records but I also want it to return the Squirrel records as well.
My query looks like this
SELECT title
FROM movies
WHERE vectors @@ to_tsoquery('squire');
I thought I could do to_tsquery('squire%')
but that does not work.
How do I get it to search for partial matches ?
Also, in my database I have records that are movies and others that are just TV Shows. These are differentiated by the "" over the name, so like "Munsters" is a TV Show, whereas The Munsters is the film of the show. What I want to be able to do is search for just the TV Show AND just the movies. Any idea on how I can achieve this ?
Regards Anthoni
In PostgreSQL, you use two functions to perform Full Text Search. They are to_tsvector() and to_tsquery(). Let's see how they work and to use them first. to_tsvector() function breaks up the input string and creates tokens out of it, which is then used to perform Full Text Search using the to_tsquery() function.
Introduction. Full-text search (FTS) is a technique used by search engines to find results in a database. It can be used to power search results on websites like shops, search engines, newspapers, and more.
And the more size you want to search in, the more Elasticsearch is better than PostgreSQL in performance. Additionally, you could also get many benefits and great performance if you pre-process the posts into several fields and indexes well before storing into Elasticsearch.
Try,
SELECT title FROM movies WHERE to_tsvector(title) @@ to_tsquery('squire:*')
This works on PostgreSQL 8.4+
Anthoni,
Assuming you plan on using only ASCII encoding (could be difficult, I'm aware), a very viable option may be the Trigram (pg_trgm) module: http://www.postgresql.org/docs/9.0/interactive/pgtrgm.html
Trigram utilizes built-in indexing methods such as Gist and Gin. The only modification you have to make is when defining your index, specify an Operator Class of either gist_trgm_ops
or gin_trgm_ops
.
If the contrib modules aren't already installed, in Ubuntu it's as easy and running the following command from the shell:
# sudo apt-get install postgresql-contrib
After the contrib modules are made available, you must install the pg_trgm extension into the database in question. You do this by executing the following PostgreSQL query on the database you wish to install the module into:
CREATE EXTENSION pg_trgm;
After the pg_trgm extension has been installed, we're ready to have some fun!
-- Create a test table. CREATE TABLE test (my_column text) -- Create a Trigram index. CREATE INDEX test_my_colun_trgm_idx ON test USING gist (my_column gist_trgm_ops); -- Add a couple records INSERT INTO test (my_Column) VALUES ('First Entry'), ('Second Entry'), ('Third Entry') -- Query using our new index -- SELECT my_column, similarity(my_column, 'Frist Entry') AS similarity FROM test WHERE my_column % 'Frist Entry' ORDER BY similarity DESC
@alexander-mera solution works great!
Note: Also make sure to convert spaces to +
. For example, if you are searching for squire knight
.
SELECT title FROM movies WHERE to_tsvector(title) @@ to_tsquery('squire+knight:*')
Even using LIKE
you will not be able to get 'squirrel' from squire%
because 'squirrel' has two 'r's. To get Squire and Squirrel you could run the following query:
SELECT title FROM movies WHERE vectors @@ to_tsquery('squire|squirrel');
To differentiate between movies and tv shows you should add a column to your database. However, there are many ways to skin this cat. You could use a sub-query to force postgres to first find the movies matching 'squire' and 'squirrel' and then search that subset to find titles that begin with a '"'. It is possible to create indexes for use in LIKE '"%...'
searches.
Without exploring other indexing possibilities you could also run these - mess around with them to find which is fastest:
SELECT title
FROM (
SELECT *
FROM movies
WHERE vectors @@ to_tsquery('squire|squirrel')
) t
WHERE title ILIKE '"%';
or
SELECT title
FROM movies
WHERE vectors @@ to_tsquery('squire|squirrel')
AND title ILIKE '"%';
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With