I'm considering about adding semantic analysis to my Solr installation, but I don't exactly know where to start.
Basically, I'd like Solr to be able to find "similar" words (taken from the body of the indexed documents). For example, if I search for "music", I should be able to query the semantic engine and obtain "rock", "pop", etc. (of course if these words appeared near to music in some of the indexed documents).
I found this project, but I don't know if it is the correct place to start: http://code.google.com/p/semanticvectors/
Semantic indexing is a good place to start. However, in my experience, these kind of technologies don't work that well in practice. You often end up with very bizarre results. Also, because of Google, people have a certain expectation of how keyword search should behave - i.e. your search term should appear in the matching document.
You may use the Lucene Wordnet contrib package to look for synonyms.
Optimizing Findability in Lucene and Solr gives other ways to expand queries.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With