I want to add more words to the default "english" stopwards, e.g., "inc", "incorporated", "ltd" and "limited". How can I achieve this?
My current code to create an index is as follows. Thanks.
PUT /my_index
{
"settings": {
"analysis": {
"filter": {
"my_stop": {
"type": "stop",
"stopwords": "_english_"
}
},
"analyzer": {
"my_analyzer": {
"tokenizer": "whitespace",
"char_filter": [
"html_strip"
],
"filter": [
"lowercase",
"asciifolding",
"my_stop"
]
}
}
}
}
}
My test code
POST my_index/_analyze
{
"analyzer": "my_analyzer",
"text": "House of Dickson<br> corp"
}
I've been able to combine custom stopwords with the standard English using the following:
{
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"custom_stop",
"english_stop"
]
}
},
"filter": {
"custom_stop": {
"type": "stop",
"stopwords": ["custom1","custom2","custom3"]
},
"english_stop": {
"type": "stop",
"stopwords": "_english_"
}
}
}
}
The set of "english" stopwords is the same as the set in Standard Analyzer.
You can create a file with these words and your additional stopwords and use stopwords_path
option to point to this file (instead of stopwords
setting):
{
"settings": {
"analysis": {
"filter": {
"my_stop": {
"type": "stop",
"stopwords_path": "stopwords/custom_english.txt"
}
},
...
}
You can find more information how the file should look like in ES-docs (UTF-8, single stopword per line, file present on all nodes).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With