Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Solr query - Is there a way to limit the size of a text field in the response

Is there a way to limit the amount of text in a text field from a query? Here's a quick scenario....

I have 2 fields:

  • docId - int
  • text - string.

I will query the docId field and want to get a "preview" text from the text field of 200 chars. On average, the text field has anything from 600-2000 chars but I only need a preview.

eg. [mySolrCore]/select?q=docId:123&fl=text

Is there any way to do it since I don't see the point of bringing back the entire text field if I only need a small preview?

I'm not looking at hit highlighting since i'm not searching for specific text within the Text field but if there is similar functionaly of the hl.fragsize parameter it would be great!

Hope someone can point me in the right direction!

Cheers!

like image 274
Dan Avatar asked Jan 25 '11 11:01

Dan


2 Answers

You would have to test the performance of this work-around versus just returning the entire field, but it might work for your situation. Basically, turn on highlighting on a field that won't match, and then use the alternate field to return the limited number of characters you want.

http://solr:8080/solr/select/?q=*:*&rows=10&fl=author,title&hl=true&hl.snippets=0&hl.fl=sku&hl.fragsize=0&hl.alternateField=description&hl.maxAlternateFieldLength=50

Notes:

  • Make sure your alternate field does not exist in the field list (fl) parameter
  • Make sure your highlighting field (hl.fl) does not actually contain the text you want to search

I find that the cpu cost of running the highlighter sometimes is more than the cpu cost and bandwidth of just returning the whole field. You'll have to experiment.

like image 112
Aaron D Avatar answered Oct 01 '22 22:10

Aaron D


I decided to turn my comment into an answer.

I would suggest that you don't store your text data in Solr/Lucene. Only index the data for searching and store a unique ID or URL to identify the document. The contents of the document should be fetched from a separate storage system.

Solr/Lucene are optimized for searches. They aren't your data warehouse or database, and they shouldn't be used that way. When you store more data in Solr than necessary, you negatively impact your entire search system. You bloat the size of indices, increase replication time between masters and slaves, replicate data that you only need a single copy of, and waste cache memory on document caches that should be leveraged to make search faster.

So, I would suggest 2 things.

First, optimally, remove the text storage entire from your search index. Fetch the preview text and whole text from a secondary system that is optimized for holding documents, like a file server.

Second, sub-optimal, only store the preview text in your search index. Store the entire document elsewhere, like a file server.

like image 35
rfeak Avatar answered Oct 01 '22 21:10

rfeak