Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing commas when using whitespace tokenizer

When using the whitespace tokenizer a text like "there, he is." would be split to "there," "he" and "is.". Naturally I would want to remove those punctuation that the standard tokenizer would had removed automatically.

My questions are:

  1. How to trim those punctuation marks? (in elasticsearch setting, like adding another token filter or charfilter)
  2. I need to use the whitespace tokenizer mainly because I don't want hyphenated words to be splited. Is there a way I can achieve this while still using the standard tokenizer?
like image 416
Dionysian Avatar asked Nov 26 '22 08:11

Dionysian


1 Answers

You can use the char filter to remove the the ",". Char Filter

like image 68
user3340677 Avatar answered Dec 06 '22 17:12

user3340677