Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgreSQL: How to go around ts_vector size limitations?

I'm creating a search inside a Rails app using the pg_search gem. However, one of the tables have a Text datatype field that it's content happens to be a little larger than usual.

Now when I need to setup a tsvector column for the text columns, I face some limitations that due the the text field size vs tsvector size.

ERROR: string is too long for tsvector (5068741 bytes, max 1048575 bytes)

Is there any way that I determine condition to skip bigger Text fields while creating the tsvector column in the SQL trigger to do something like this:

pseudocode:

execute(<<-TRIGGERSQL)
CREATE OR REPLACE FUNCTION public.essays_before_insert_update_row_tr()
 RETURNS trigger
 LANGUAGE plpgsql
AS $function$
BEGIN
    If (SELECT LEN(body_text) FROM essays) <= 1048575
      new.tsv_body_text := to_tsvector('pg_catalog.english', coalesce(new.body_text,''));
      RETURN NEW;
    End
END;
$function$
  TRIGGERSQL

  # no candidate create_trigger statement could be found, creating an adapter-specific one
  execute("CREATE TRIGGER essays_before_insert_update_row_tr BEFORE INSERT OR UPDATE ON \"essays\" FOR EACH ROW EXECUTE PROCEDURE essays_before_insert_update_row_tr()")

related question that I found without an answer:

Postgresql - converting text to ts_vector

like image 203
0bserver07 Avatar asked May 26 '15 22:05

0bserver07


People also ask

How do I do a full text search in PostgreSQL?

In PostgreSQL, you use two functions to perform Full Text Search. They are to_tsvector() and to_tsquery(). Let's see how they work and to use them first. to_tsvector() function breaks up the input string and creates tokens out of it, which is then used to perform Full Text Search using the to_tsquery() function.

What is Tsvector in PostgreSQL?

A tsvector value is a sorted list of distinct lexemes, which are words that have been normalized to merge different variants of the same word (see Chapter 12 for details).

What is FTS in PostgreSQL?

Understanding Postgres Full Text Search: 10 Critical Aspects. Sarad on Database, PostgreSQL, Tutorial • August 28th, 2017 • Write for Hevo. Setting up PostgreSQL Full Text Search (FTS) helps users bring up a fully-fledged search engine without a steep learning curve and new software to install.


Video Answer


1 Answers

A simple workaround is to just invoke to_tsvector() with a truncated text value. For example, using the trigger example from the Postgres manual as starting point this approach looks like this:

CREATE FUNCTION essays_tsv_trigger_fn() RETURNS trigger AS $$
begin
    new.tsv_body_text := to_tsvector('english', left(new.body_text, 4*1024*1024));
    return new;
end
$$ LANGUAGE plpgsql;

CREATE TRIGGER essays_tsv_trigger BEFORE INSERT OR UPDATE
    ON essays FOR EACH ROW EXECUTE FUNCTION essays_tsv_trigger_fn();

This truncates the document's content to 4 MiB which should be useful enough for many document collections. Instead of just ignoring 'overly' long documents you include at least parts of it. In my experience, 4 MiB works well for technical english documentation. Depending on the size of the actually used vocabulary you could even succeed when truncating with a larger value like 10 MiB.

If you really want to ignore too long documents you could guard the to_tsvector() assignment with an if statement like this:

CREATE FUNCTION essays_tsv_trigger_fn() RETURNS trigger AS $$
begin
    if length(new.body_text) <= 4*1024*1024 then
        new.tsv_body_text := to_tsvector('english', new.body_text);
    end if;
  return new;
end
$$ LANGUAGE plpgsql;
like image 161
maxschlepzig Avatar answered Sep 28 '22 17:09

maxschlepzig