How to fix java OutOfMemoryError: Java heap space from DataImportHandler?

Tags:

I am trying to import a large dataset (41million records) into a new Solr index. I have setup the core, it works, I inserted some test docs, they work. I have setup the data-config.xml as below and then I start the full-import. After about 12 hours! the import fails.

The document size can get quite large, could the error be because of a large document (or field) or due to the volume of data going into the DataImportHandler?

How can I get this frustrating import task working!?!

I have included the tomcat error log below.

Let me know if there is any info i have missed!

logs:

Jun 1, 2011 5:47:55 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call
INFO: Creating a connection for entity results with URL: jdbc:sqlserver://myserver;databaseName=mydb;responseBuffering=adaptive;selectMethod=cursor
Jun 1, 2011 5:47:56 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call
INFO: Time taken for getConnection(): 1185
Jun 1, 2011 5:48:02 PM org.apache.solr.core.SolrCore execute
INFO: [results] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=0
...
Jun 2, 2011 5:16:32 AM org.apache.solr.common.SolrException log
SEVERE: Full Import failed:org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.OutOfMemoryError: Java heap space
    at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:664)
    at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:267)
    at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:186)
    at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:353)
    at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:411)
    at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:392)
Caused by: java.lang.OutOfMemoryError: Java heap space
    at java.lang.StringCoding$StringDecoder.decode(Unknown Source)
    at java.lang.StringCoding.decode(Unknown Source)
    at java.lang.String.<init>(Unknown Source)
    at java.lang.String.<init>(Unknown Source)
    at com.microsoft.sqlserver.jdbc.DDC.convertStreamToObject(DDC.java:419)
    at com.microsoft.sqlserver.jdbc.ServerDTVImpl.getValue(dtv.java:1974)
    at com.microsoft.sqlserver.jdbc.DTV.getValue(dtv.java:175)
    at com.microsoft.sqlserver.jdbc.Column.getValue(Column.java:113)
    at com.microsoft.sqlserver.jdbc.SQLServerResultSet.getValue(SQLServerResultSet.java:1982)
    at com.microsoft.sqlserver.jdbc.SQLServerResultSet.getValue(SQLServerResultSet.java:1967)
    at com.microsoft.sqlserver.jdbc.SQLServerResultSet.getObject(SQLServerResultSet.java:2256)
    at com.microsoft.sqlserver.jdbc.SQLServerResultSet.getObject(SQLServerResultSet.java:2265)
    at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.getARow(JdbcDataSource.java:286)
    at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$700(JdbcDataSource.java:228)
    at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.next(JdbcDataSource.java:266)
    at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.next(JdbcDataSource.java:260)
    at org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:78)
    at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:75)
    at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238)
    at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:591)
    ... 5 more

Jun 2, 2011 5:16:32 AM org.apache.solr.update.DirectUpdateHandler2 rollback
INFO: start rollback
Jun 2, 2011 5:16:44 AM org.apache.solr.update.DirectUpdateHandler2 rollback
INFO: end_rollback

data-config.xml:

<dataConfig> 
  <dataSource type="JdbcDataSource" 
        driver="com.microsoft.sqlserver.jdbc.SQLServerDriver" 
        url="jdbc:sqlserver://myserver;databaseName=mydb;responseBuffering=adaptive;selectMethod=cursor"   
        user="sa" 
        password="password"/> 
  <document> 
    <entity name="results" query="SELECT fielda, fieldb, fieldc FROM mydb.[dbo].mytable WITH (NOLOCK)"> 
      <field column="fielda" name="fielda"/><field column="fieldb" name="fieldb"/><field column="fieldc" name="fieldc"/> 
    </entity> 
  </document> 
</dataConfig>

solrconfig.xml snippet:

<indexDefaults>
    <useCompoundFile>false</useCompoundFile>
    <mergeFactor>25</mergeFactor>
    <ramBufferSizeMB>128</ramBufferSizeMB>
    <maxFieldLength>100000</maxFieldLength>
    <writeLockTimeout>10000</writeLockTimeout>
    <commitLockTimeout>10000</commitLockTimeout>
  </indexDefaults>
  <mainIndex>
    <useCompoundFile>false</useCompoundFile>
    <ramBufferSizeMB>128</ramBufferSizeMB>
    <mergeFactor>25</mergeFactor>
     <infoStream file="INFOSTREAM.txt">true</infoStream>
  </mainIndex>

Java config settings: init mem 128mb, max 512mb

Environment: solr 3.1 tomcat 7.0.12 windows server 2008 java: v6 update 25 (build 1.6.0_25-b06) (data coming from:sql 2008 r2)

/admin/stats.jsp - DataImportHandler
    Status : IDLE
    Documents Processed : 2503083
    Requests made to DataSource : 1
    Rows Fetched : 2503083
    Documents Deleted : 0
    Documents Skipped : 0
    Total Documents Processed : 0
    Total Requests made to DataSource : 0
    Total Rows Fetched : 0
    Total Documents Deleted : 0
    Total Documents Skipped : 0
    handlerStart : 1306759913518
    requests : 9
    errors : 0

EDIT: I am currently running a sql query to find out the largest single record's field length, as I think this is probably cause of exception. Also, running import again with jconsole to monitor heap usage.

EDIT: Read solr performance factors page. changing maxFieldLength to 1000000 and changing ramBufferSizeMB = 256. Now for another import run (yay...)

262

asked Jun 01 '11 23:06

Steve Casey

1 Answers

Caused by: java.lang.OutOfMemoryError: Java heap space
    at java.lang.StringCoding$StringDecoder.decode(Unknown Source)
    at java.lang.StringCoding.decode(Unknown Source)
    at java.lang.String.<init>(Unknown Source)
    at java.lang.String.<init>(Unknown Source)
    at com.microsoft.sqlserver.jdbc.DDC.convertStreamToObject(DDC.java:419)

makes it pretty obvious that The MS JDBC driver is running out of ram. Many JDBC drivers can default to fetching all their results at once in memory. So see if this can be tuned or consider using the opensource JTDS driver which is generally better behaved anyway

I don't believe maxfieldlength is gonna help you - that will affect how much Lucene truncates, but not how much is initially transferred. Another option is to only transfer a selection at a time, say a 1 million, using TOP and ROWNUMBER and such for paging.

123

answered Oct 26 '22 15:10

MJB

Related questions
                            
                                How do I bundle a JRE in my JAR, so that it can run on systems without Java?
                            
                                Android studio ; jni.h not found
                            
                                Serialization and deserialization of lambda
                            
                                Encryption and Decryption between Java and Javascript won't work
                            
                                Gradle build failed with message: TaskDependencyResolveException: Could not determine the dependencies of task ':compileDebugKotlin'
                            
                                Add ssl certificate to selenium-webdriver
                            
                                Why is this Java method call considered ambiguous?
                            
                                Java 11 and 12 SSL sockets fail on a handshake_failure error with TLSv1.3 enabled
                            
                                What is the Java grammar that allows "new int[] {0}[0] = 1;" to compile?
                            
                                Preserve JTable selection across TableModel change
                            
                                How would you encode a Map<String, Object> using Protocol Buffers?
                            
                                View Errors in Intellij Project [duplicate]
                            
                                Project templates Eclipse / Java [closed]
                            
                                Recommended JSP tag libraries [closed]
                            
                                Java RESTful Web Service Tutorial with Eclipse and Jetty
                            
                                How do I Suppress Warnings in CheckStyle? [duplicate]
                            
                                Genetic Programming library for Java [closed]
                            
                                Hibernate TypeResolver
                            
                                APT (Annotation Processing Tool)
                            
                                Heterogeneous container to store genericly typed objects in Java

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to fix java OutOfMemoryError: Java heap space from DataImportHandler?

Tags:

java

solr

tomcat7

Steve Casey

People also ask

1 Answers

MJB

Recent Activity

Donate For Us