Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgresql: Full text search within lob possible?

We'd like to use PostgreSQL to store documents. As some of them might be up to 2 GB big, we have to use the lob-Datatype, where the large objects are stored in a separate table (pg_largeobject), referenced by an OID, as per the docs on large objects.

For us it's very important that these documents (.pdf, .doc, ...) can be searched. With the built-in PostgreSQL full text search tables and columns can be searched, but is it possible to search the large-objects in the pg_largeobject table as well?

If not, we have to use Oracle.

like image 350
vera Avatar asked Oct 29 '12 09:10

vera


People also ask

Is PostgreSQL good for full text search?

Yes, You Can Keep Full-Text Search in Postgres You can get even deeper and make your Postgres full-text search even more robust, by implementing features such as highlighting results, or writing your own custom dictionaries or functions.

How does Postgres full text search work?

In PostgreSQL, you use two functions to perform Full Text Search. They are to_tsvector() and to_tsquery(). Let's see how they work and to use them first. to_tsvector() function breaks up the input string and creates tokens out of it, which is then used to perform Full Text Search using the to_tsquery() function.

How do I find text in PostgreSQL function?

We have search text by using the to_tsvector function in PostgreSQL. In to_tsvector, “ts” is defined as text search. In to_tsvector, the tsvector is the data type of to_tsvector function. This function will return the lexeme tokens with pointers in PostgreSQL.

What is FTS in PostgreSQL?

To summarize, we learnt how to perform full-text search operation in PostgreSQL. If you liked our article, check out the book Mastering PostgreSQL 10 to understand how to perform operations such as indexing, query optimization, concurrent transactions, table partitioning, server tuning, and more.


1 Answers

There are at least two issues here.

  1. Full-text search doesn't really work on large objects stored as lob or oid-references. You cannot full-text index the contents of pg_largeobject.

  2. Full-text search is an indexing system for text. It cannot index PDF, Microsoft Word documents, or other random binary files. It does not have provision for text-extraction tool callbacks, etc.

You can either:

  • Create a table that contains text extracted from those files using external tools along with an oid that refers to the file its self, then full-text index that table of extracted text; or

  • Use a more powerful, full-featured external search system like Solr (based on Lucene) that's designed to cope with varying formats, do its own text extraction, etc.

like image 110
Craig Ringer Avatar answered Sep 24 '22 15:09

Craig Ringer