Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What characters does the standard tokenizer delimit on?

I was wondering which characters are used to delimit a string for elastic search's standard tokenizer?

like image 731
David Carek Avatar asked Sep 23 '15 14:09

David Carek


1 Answers

As per the documentation I believe this is the list of symbols/characters used for defining tokens: http://unicode.org/reports/tr29/#Default_Word_Boundaries

like image 139
Andrei Stefan Avatar answered Oct 14 '22 09:10

Andrei Stefan