Adding a document to the index in SOLR: Document contains at least one immense term

Tags:

solr

I am adding (by a Java program) for indexing, a document in SOLR index, but after add(inputDoc) method there is an exception. The log in solr web interface contains the following:

Caused by: java.lang.IllegalArgumentException: Document contains at least one immense term in field="text" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[99, 111, 112, 101, 114, 116, 105, 110, 97, 32, 105, 110, 102, 111, 114, 109, 97, 122, 105, 111, 110, 105, 32, 113, 117, 101, 115, 116, 111, 32]...', original message: bytes can be at most 32766 in length; got 226781
    at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:687)
    at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:359)
    at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:318)
    at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:239)
    at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:457)
    at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1511)
    at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:240)
    at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:164)
    ... 40 more
Caused by: org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: bytes can be at most 32766 in length; got 226781
    at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:284)
    at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:151)
    at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:663)
    ... 47 more

Please what should I do to solve this problem?

618

asked Apr 04 '15 10:04

2 Answers

I had the same problem as yours, finally I solved my problem. Please check the type of your "text" field, I suspect it must be "strings".

You can find it in the managed-schema of the core:

<field name="text" type="strings"/>

Or you can go to Solr Admin, access: http://localhost:8983/solr/CORE_NAME/schema/fieldtypes?wt=json and then search for "text", if it is something like the follow, you know you defined your "text" field as strings type:

  {
  "name":"strings",
  "class":"solr.StrField",
  "multiValued":true,
  "sortMissingLast":true,
  "fields":["text"],
  "dynamicFields":["*_ss"]},

Then my solution works for you, you can change the type from "strings" to "text_general" in managed-schema. (make sure type of "text" in schema.xml is also "text_general")

   <field name="text" type="text_general">

This will solve your problem. strings is string field, but text_general is text field.

130

answered Sep 28 '22 11:09

Bernice

You probably met what is described in LUCENE-5472 [1]. There, Lucene throws an error if a term is too long. You could:

use (in index analyzer), a LengthFilterFactory [2] in order to filter out those tokens that don't fall withing a requested length range
use (in index analyzer), a TruncateTokenFilterFactory [3] for fixing the max length of indexed tokens
use a custom UpdateRequestProcessor, but this actually depends on your context

[1] https://issues.apache.org/jira/browse/LUCENE-5472
[2] https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LengthFilterFactory
[3] https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.TruncateTokenFilterFactory [4] https://wiki.apache.org/solr/UpdateRequestProcessor

answered Sep 28 '22 11:09

Andrea

Related questions
                            
                                Solr and facet search
                            
                                org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler
                            
                                String length function query in Solr
                            
                                Search Box in Symfony2 with Solr
                            
                                not attempt to authenticate using SASL (unknown error)
                            
                                Haystack says “Model could not be found for SearchResult”
                            
                                How to configure Solr for improved indexing speed
                            
                                how to migrate mysql data to ElasticSearch realtime
                            
                                Search with various combinations of space, hyphen, casing and punctuations
                            
                                Apache Solr Failover Support in Master-Slave Setup
                            
                                Speeding up Solr Indexing
                            
                                Is it possible to use environment variables within solrconfig.xml for dataDir variable?
                            
                                Tomcat SOLR multiple cores setup
                            
                                Solr search for hashtag or mentions
                            
                                How to store tree data in a Lucene/Solr/Elasticsearch index or a NoSQL db?
                            
                                Sunspot_Rails - undefined method `searchable' on page
                            
                                PDFBox adding white spaces within words
                            
                                Solr query syntax for array field
                            
                                How to fix 'collection1' which is not available due to init failure: Cannot create directory?
                            
                                SOLR: What does an autoSoftCommit maxtime of -1 mean?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With