PostgreSQL: How to go around ts_vector size limitations?

Tags:

I'm creating a search inside a Rails app using the pg_search gem. However, one of the tables have a Text datatype field that it's content happens to be a little larger than usual.

Now when I need to setup a tsvector column for the text columns, I face some limitations that due the the text field size vs tsvector size.

ERROR: string is too long for tsvector (5068741 bytes, max 1048575 bytes)

Is there any way that I determine condition to skip bigger Text fields while creating the tsvector column in the SQL trigger to do something like this:

pseudocode:

execute(<<-TRIGGERSQL)
CREATE OR REPLACE FUNCTION public.essays_before_insert_update_row_tr()
 RETURNS trigger
 LANGUAGE plpgsql
AS $function$
BEGIN
    If (SELECT LEN(body_text) FROM essays) <= 1048575
      new.tsv_body_text := to_tsvector('pg_catalog.english', coalesce(new.body_text,''));
      RETURN NEW;
    End
END;
$function$
  TRIGGERSQL

  # no candidate create_trigger statement could be found, creating an adapter-specific one
  execute("CREATE TRIGGER essays_before_insert_update_row_tr BEFORE INSERT OR UPDATE ON \"essays\" FOR EACH ROW EXECUTE PROCEDURE essays_before_insert_update_row_tr()")

related question that I found without an answer:

Postgresql - converting text to ts_vector

203

asked May 26 '15 22:05

0bserver07

Video Answer

1 Answers

A simple workaround is to just invoke to_tsvector() with a truncated text value. For example, using the trigger example from the Postgres manual as starting point this approach looks like this:

CREATE FUNCTION essays_tsv_trigger_fn() RETURNS trigger AS $$
begin
    new.tsv_body_text := to_tsvector('english', left(new.body_text, 4*1024*1024));
    return new;
end
$$ LANGUAGE plpgsql;

CREATE TRIGGER essays_tsv_trigger BEFORE INSERT OR UPDATE
    ON essays FOR EACH ROW EXECUTE FUNCTION essays_tsv_trigger_fn();

This truncates the document's content to 4 MiB which should be useful enough for many document collections. Instead of just ignoring 'overly' long documents you include at least parts of it. In my experience, 4 MiB works well for technical english documentation. Depending on the size of the actually used vocabulary you could even succeed when truncating with a larger value like 10 MiB.

If you really want to ignore too long documents you could guard the to_tsvector() assignment with an if statement like this:

CREATE FUNCTION essays_tsv_trigger_fn() RETURNS trigger AS $$
begin
    if length(new.body_text) <= 4*1024*1024 then
        new.tsv_body_text := to_tsvector('english', new.body_text);
    end if;
  return new;
end
$$ LANGUAGE plpgsql;

161

answered Sep 28 '22 17:09

maxschlepzig

Related questions
                            
                                Store functions in mongodb using Mongoid 3
                            
                                Tweet photo with twitter gem
                            
                                Can I use Amazon Elastic Transcoder to only create thumbnails?
                            
                                Rails cache from the console returning nothing nil
                            
                                delayed_job fails jobs when running as a daemon. Runs fine when using rake jobs:work
                            
                                Using rbenv with Docker
                            
                                Capistrano V3 failing on database.yml
                            
                                sublime shortcut for erb comments
                            
                                Rails mailer error with inline attachment
                            
                                Why is gem environment different in RubyMine and in terminal
                            
                                rails no such file to load -- ap (LoadError)
                            
                                How can we do SEO for an AngularJS site with angular-translate?
                            
                                Disable Sprockets asset caching in development on Rails 4
                            
                                reset_counters raise undefined method error, but works fine on similiar relation
                            
                                Rails fixture relationship does not exist
                            
                                JW Player - Error loading player: HTML5 player not found in Rails 3.2 app on Heroku
                            
                                How to check for database changes of in-memory records?
                            
                                How to make tr clickable over link_to [duplicate]
                            
                                How to click radio button with capybara in ruby on rails app
                            
                                Rails relative_url doesnt adjust links

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

PostgreSQL: How to go around ts_vector size limitations?

Tags:

postgresql

ruby-on-rails

rails-postgresql

tsvector

pg-search

0bserver07

People also ask

Video Answer

1 Answers

maxschlepzig

Recent Activity

Donate For Us