I am using pg_trgm
to perform fuzzy string match where characters can be Chinese. Strangely, on my Ubuntu server, everything is fine, as following:
SELECT show_trgm('原作者');
> {0xa09182,0xcdfdbb,0x183afe,leD}
However, on my Mac, it does not work:
SELECT show_trgm('原作者');
> {}
I guess it is due to some strange encoding staff, but I examined all settings that I can possibly imagine, including:
SHOW SERVER_VERSION;
SHOW SERVER_ENCODING;
SHOW LC_COLLATE;
SHOW LC_CTYPE;
Where on Ubuntu it shows:
9.5.1
UTF8
en_US.UTF-8
en_US.UTF-8
and on Mac it shows:
9.5.3
UTF8
en_US.UTF-8
en_US.UTF-8
Also, the pg_trgm
versions are both 1.1, according to SELECT * FROM pg_extension
.
Could anyone help me to find why the pg_trgm
not works on Unicode on my Mac?
Reason for this is that pg_trgm
depends on libc
(system library shipped with OS) routines for classifying which characters are alphabetic and which aren't and this is (unfortunately) different between OSes. Apple Mac OS X is known for interpreting UTF-8 different way than other Unix/Unix-like systems. Character classification is different per-locale and is driven by category LC_CTYPE
(and envvar of same name).
Check output of postgres=# \l
and you should see Ctype
column which tells you how characters are classified in your database.
C
(seen that on Apple MacOS X before) try to create database again specifying CREATE DATABASE foo ... LC_CTYPE="en_US.UTF-8"
en_US.UTF-8
it is very likely MacOS X doesn't classify UTF-8 Chinese characters as alphabetic in this locale (not surprising). Try LC_CTYPE="zh_CN.UTF-8"
instead and that should work.In macOS, this is the issue of character encoding. Based on the language you have to explicitly flag the encoding type. default en_US.UTF-8 will definitely don't work. so:
Chinese : LC_CTYPE="zh_CN.UTF-8"
likewise, locale should be changed accordingly to the language. Although, there is no point of encode/decode Chinese in US English
You can create db:
CREATE DATABASE mydb WITH ENCODING='UTF8' LC_CTYPE='zh_CN.UTF-8' LC_COLLATE='zh_CN.UTF-8' OWNER=postgres TEMPLATE=template0 CONNECTION LIMIT=-1;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With