I am adding (by a Java program) for indexing, a document in SOLR index, but after add(inputDoc)
method there is an exception. The log in solr web interface contains the following:
Caused by: java.lang.IllegalArgumentException: Document contains at least one immense term in field="text" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[99, 111, 112, 101, 114, 116, 105, 110, 97, 32, 105, 110, 102, 111, 114, 109, 97, 122, 105, 111, 110, 105, 32, 113, 117, 101, 115, 116, 111, 32]...', original message: bytes can be at most 32766 in length; got 226781
at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:687)
at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:359)
at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:318)
at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:239)
at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:457)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1511)
at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:240)
at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:164)
... 40 more
Caused by: org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: bytes can be at most 32766 in length; got 226781
at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:284)
at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:151)
at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:663)
... 47 more
Please what should I do to solve this problem?
By adding content to an index, we make it searchable by Solr. A Solr index can accept data from many different sources, including XML files, comma-separated value (CSV) files, data extracted from tables in a database, and files in common file formats such as Microsoft Word or PDF.
In Solr, a Document is the unit of search and index. An index consists of one or more Documents, and a Document consists of one or more Fields. In database terminology, a Document corresponds to a table row, and a Field corresponds to a table column.
Solr in Action In general, your documents may contain fields that aren't useful from a search perspective but are still useful for displaying search results. In Solr, these are called stored fields.
I had the same problem as yours, finally I solved my problem. Please check the type of your "text" field, I suspect it must be "strings".
You can find it in the managed-schema of the core:
<field name="text" type="strings"/>
Or you can go to Solr Admin, access: http://localhost:8983/solr/CORE_NAME/schema/fieldtypes?wt=json and then search for "text", if it is something like the follow, you know you defined your "text" field as strings type:
{
"name":"strings",
"class":"solr.StrField",
"multiValued":true,
"sortMissingLast":true,
"fields":["text"],
"dynamicFields":["*_ss"]},
Then my solution works for you, you can change the type from "strings" to "text_general" in managed-schema. (make sure type of "text" in schema.xml is also "text_general")
<field name="text" type="text_general">
This will solve your problem. strings is string field, but text_general is text field.
You probably met what is described in LUCENE-5472 [1]. There, Lucene throws an error if a term is too long. You could:
use (in index analyzer), a LengthFilterFactory [2] in order to filter out those tokens that don't fall withing a requested length range
use (in index analyzer), a TruncateTokenFilterFactory [3] for fixing the max length of indexed tokens
use a custom UpdateRequestProcessor, but this actually depends on your context
[1] https://issues.apache.org/jira/browse/LUCENE-5472
[2] https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LengthFilterFactory
[3] https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.TruncateTokenFilterFactory
[4] https://wiki.apache.org/solr/UpdateRequestProcessor
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With