SOLR: Create term vector (like data returned from TermVectorComponent) from raw text

Question

Using http://wiki.apache.org/solr/TermVectorComponent I can get indexed terms and their frequencies for any document stored in my index. How can I get the same information for a text, without storing the text in my index? I just want SOLR to process the text and return the information, but without having to store the document in my index.

Srikanth Venugopalan · Accepted Answer

AFAIK this isn't possible without storing data in SOLR.

If you are looking to do text analysis (I understand this is broader than what you ask for), I would recommend the below alternatives:

MAUI - does keyphrase and terminology extraction.
Gensim - does topic modelling
Kea - keyword extraction

I've also come across some python scripts that do term frequency analysis. Have a look at Mincemeat, particulary the example, which does term frequency calculation.

D_K · Answer

From what you ask for I conclude that you actually need a search library, not a full search engine (service). That library is Lucene. Perhaps, this will help for starters: How to extract Document Term Vector in Lucene 3.5.0. You could store the index in RAM for the sake of computing necessary bits and then get rid of the index.

SOLR: Create term vector (like data returned from TermVectorComponent) from raw text

Tags:

solr

Achim

2 Answers

Srikanth Venugopalan

D_K

Recent Activity

Donate For Us

SOLR: Create term vector (like data returned from TermVectorComponent) from raw text

Tags:

solr

Achim

2 Answers

Srikanth Venugopalan

D_K

Related questions

Recent Activity

Donate For Us