Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to index source code with ElasticSearch

I need to provide full text search on javascript source files and highlighting of results.

My question is what combination of existing ElasticSearch tokenizers and analyzers would be best for this?

like image 264
Arron S Avatar asked Oct 17 '11 17:10

Arron S


1 Answers

Interesting question but I'm not aware of an out of the box solution. You can use a WordDelimiter tokenizer as you can specify e.g. the underscore to be handled as a digit and then functions like hello_world (or helloWorld if camelcase is enabled) will be searchable via hello or world.

But I doubt that the results are sufficient ... and you'll have to implement a source code analyzer yourself or use code which extracts the syntax tree to index method names and bodies into different fields

like image 142
Karussell Avatar answered Sep 30 '22 13:09

Karussell