Using http://wiki.apache.org/solr/TermVectorComponent I can get indexed terms and their frequencies for any document stored in my index. How can I get the same information for a text, without storing the text in my index? I just want SOLR to process the text and return the information, but without having to store the document in my index.
AFAIK this isn't possible without storing data in SOLR.
If you are looking to do text analysis (I understand this is broader than what you ask for), I would recommend the below alternatives:
I've also come across some python scripts that do term frequency analysis. Have a look at Mincemeat, particulary the example, which does term frequency calculation.
From what you ask for I conclude that you actually need a search library, not a full search engine (service). That library is Lucene. Perhaps, this will help for starters: How to extract Document Term Vector in Lucene 3.5.0. You could store the index in RAM for the sake of computing necessary bits and then get rid of the index.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With