Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgresql full text search in postgresql - japanese, chinese, arabic

I'm designing a fulltext search function in postgresql for my current project. It works ok with ispell/myspell dictionaries so far. Now I need to add support for chinese, japanese and arabic search. Where do I start? There are no templates or dictionaries available for those languages as far as I can see. Will it work with pg_catalog.simple configuration?

like image 709
stach Avatar asked Mar 22 '10 21:03

stach


3 Answers

Just a hint from the manual: A large list of dictionaries is available on the OpenOffice Wiki.

like image 120
Frank Heikens Avatar answered Sep 17 '22 14:09

Frank Heikens


Dictionaries won't help you too much with Chinese - you'll need to look in to NGRAM tokenising...

like image 35
simon Avatar answered Sep 18 '22 14:09

simon


The similar solution of link at stackoverflow.com is How do I implement full text search in Chinese on PostgreSQL? .

Although that, I would provide a solution below in detail based on my experience and a solution on Internet. I use both tools of SCWS and zhparser as the solution of Chinese full-text search in postgres.

20160131 Update:
You must check whether you have installed postgresql-server-devel-{number version} because we will use pgxs function from it for creating extension in postgresql.

Step1: install SCWS.
It's remarkable that --prefix=/usr/local/scws follows ./configure . Not just has ./configure along in below 4th line.

wget http://www.xunsearch.com/scws/down/scws-1.2.2.tar.bz2
tar xvjf scws-1.2.2.tar.bz2
cd scws-1.2.2
./configure --prefix=/usr/local/scws 
make
make install

To check whether it installed successfully, please enter below command:

ls -al /usr/local/scws/lib/libscws.la


Step2: Install zhparser

git clone https://github.com/amutu/zhparser.git
cd zhparser
SCWS_HOME=/usr/local/scws/include make && make install

20160131 Update: If you use Mac OS X Yosemite, aboved value of SCWS_HOME is same. But if you use Ubuntu 14.04 LTS, please change value of SCWS_HOME to /usr/local/scws .

Step3: Configure a new extension using zhparser in Postres
Step3.1: Login your postgres database through terminal/commandline

psql yourdatabasename

Step3.2: Create extension in Postgres. You could specify what dictionary name you want.

CREATE EXTENSION zhparser;
CREATE TEXT SEARCH CONFIGURATION dictionarynameyouwant (PARSER = zhparser);
ALTER TEXT SEARCH CONFIGURATION dictionarynameyouwant ADD MAPPING FOR n,v,a,i,e,l WITH simple;


If you follow above steps, you can use the function of Postgres full-text searching in Chinese/Mandarin words.

Extra step(not necessary) in Rails for using pg_search gem: Step4. Configure the dictionary name at :dictionary attribute of :tsearch in app/models/yourmodel.rb

class YourOwnClass < ActiveRecord::Base
    ...
    include PgSearch
    pg_search_scope :functionnameyoulike, :against => [columnsyoulike1, columnsyoulike2, ...,etc], :using => { :tsearch => {:dictionary => "dictionary name you just specified in creating a extension in postgres", blah blah blah, ..., etc} }
end

Reference:
1. SCWS install tutorial
2. [email protected]
3. Francs' Post - Postgres full-text search in Chinese with zhparser and SCWS
4. Rails365.net's Post - Postgres full-text search in Chinese with pg_search gem with zhparser
5. My Post at xuite.net - Make Postgres support full text search in Mandarin/Chinese

like image 34
Howardsun Avatar answered Sep 17 '22 14:09

Howardsun