Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgres - Full Text Search to accept emojis

I want to create a Full Text Search that accepts emojis on the query, or another type of index to search on text. For example, I have this text: Playa 🌊🌞🌴 @CobolIquique h' and PostgreSQL parse it weirdly on the emojis.

Debugging, Using SELECT * FROM ts_debug('english','Playa 🌊🌞🌴 @CobolIquique h'); I have the following result:

Results 1

And I don't know why the token is considered an space symbol. If I debug the parser SELECT * FROM ts_parse('default', 'Playa 🌊🌞🌴 @CobolIquique h'); I just get the same tokens and with the tokens types ts_token_type('default') there is not a emoji type (or something similar). So, How can I create a parser to split the string correctly with the spaces and doesn't consider emojis as blank spaces? or How can I create a text index that can use emojis on the queries?

like image 975
FeanDoe Avatar asked Sep 27 '16 15:09

FeanDoe


1 Answers

To create a new parser, which is different from default one, you should be a C programmer and you should write your own PostgreSQL extension. This extension should define the following functions:

start_function();
gettoken_function();
end_function();
lextypes_function();
headline_function(); // optional

As an example you can examine pg_tsparser module.

like image 117
Artur Avatar answered Oct 06 '22 02:10

Artur