I know that you can find most used terms in an index with using facets.
For example on following inputs:
"A B C"
"AA BB CC"
"A AA B BB"
"AA B"
term facet returns this:
B:3
AA:3
A:2
BB:2
CC:1
C:1
But I'm wondering that is it possible to list followings:
AA B:2
A B:1
BB CC:1
....etc...
Is there such a feature in ElasticSearch?
As mentioned in ramseykhalaf's comment, a shingle filter would produce tokens of length "n" words.
"settings" : {
"analysis" : {
"filter" : {
"shingle":{
"type":"shingle",
"max_shingle_size":5,
"min_shingle_size":2,
"output_unigrams":"true"
},
"filter_stop":{
"type":"stop",
"enable_position_increments":"false"
}
},
"analyzer" : {
"shingle_analyzer" : {
"type" : "custom",
"tokenizer" : "whitespace",
"filter" : ["standard," "lowercase", "shingle", "filter_stop"]
}
}
}
},
"mappings" : {
"type" : {
"properties" : {
"letters" : {
"type" : "string",
"analyzer" : "shingle_analyzer"
}
}
}
}
See this blog post for full details.
I'm not sure if elasticsearch will let you do this the way you want natively. But you might be interested in checking out Carrot2 - http://search.carrot2.org to accomplished what you want (and probably more.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With