I need to provide full text search on javascript source files and highlighting of results.
My question is what combination of existing ElasticSearch tokenizers and analyzers would be best for this?
Interesting question but I'm not aware of an out of the box solution. You can use a WordDelimiter tokenizer as you can specify e.g. the underscore to be handled as a digit and then functions like hello_world (or helloWorld if camelcase is enabled) will be searchable via hello or world.
But I doubt that the results are sufficient ... and you'll have to implement a source code analyzer yourself or use code which extracts the syntax tree to index method names and bodies into different fields
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With