Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

org.apache.solr.common.SolrException: TransactionLog doesn't know how to serialize class org.bson.types.ObjectId; try implementing ObjectResolver?

When performing a data import from mongodb, Solr throws the following error:

org.apache.solr.common.SolrException: TransactionLog doesn't know how to serialize class org.bson.types.ObjectId; try implementing ObjectResolver?
at org.apache.solr.update.TransactionLog$1.resolve(TransactionLog.java:100)
at org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:234)
at org.apache.solr.common.util.JavaBinCodec.writeSolrInputDocument(JavaBinCodec.java:589)
at org.apache.solr.update.TransactionLog.write(TransactionLog.java:395)
at org.apache.solr.update.UpdateLog.add(UpdateLog.java:532)
at org.apache.solr.update.UpdateLog.add(UpdateLog.java:516)
at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:320)
at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:239)
at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:194)
at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:979)
at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1192)
at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:748)
at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:80)
at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:254)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:526)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:329)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:415)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:474)
at org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:457)
at java.lang.Thread.run(Thread.java:748)

My Solr version is 6.6.0. What could be the reason for the error and how can it be resolved?

like image 942
XieWilliam Avatar asked Jul 25 '17 02:07

XieWilliam


2 Answers

I came across this issue while trying to import data from multiple collections in mongoDB.

Assuming you are not using mongo-connector, I used the following to import data.

  • Solr-6.6.0
  • solr-dataimporthandler-6.6.0
  • mongo-java-driver-3.5.0
  • Solr Mongo importer

Since the returned '_id' is of type ObjectId, my work around solution was to convert the '_id' to String before indexing it into solr and while querying with respect to '_id', convert it to type ObjectId before running the query.

Download the solr mongo importer and make the following changes.

MongoMapperTransformer.java

public class MongoMapperTransformer extends Transformer {

@Override
public Object transformRow(Map<String, Object> row, Context context) {

    for (Map<String, String> map : context.getAllEntityFields()) {
        String mongoFieldName = map.get(MONGO_FIELD);
        String mongoId = map.get(MONGO_ID);
        if (mongoFieldName == null)
            continue;

        String columnFieldName = map.get(DataImporter.COLUMN);

        //If the field is ObjectId convert it into String
        if (mongoId != null && Boolean.parseBoolean(mongoId)) {
            Object srcId = row.get(columnFieldName);
            row.put(columnFieldName, srcId.toString());
        }
        else{
            row.put(columnFieldName, row.get(mongoFieldName));
        }
    }

    return row;
}


public static final String MONGO_FIELD = "mongoField";

//To identify the _id field
public static final String MONGO_ID = "objectIdToString";

}

Next, Replace the function

public Iterator <Map<String, Object>> getData(String query){...} 

in MongoDataSource.java with the following:

@Override
public Iterator<Map<String, Object>> getData(String query) {

    DBObject queryObject = new BasicDBObject();

    /* If querying by _id, since the id is a string now, 
     * it has to be converted back to type ObjectId() using the 
     * constructor 
     */ 
    if(query.contains("_id")){
        @SuppressWarnings("unchecked")
        Map<String, String> queryWithId = (Map<String, String>) JSON.parse(query);
        String id = queryWithId.get("_id");
        queryObject = new BasicDBObject("_id", new ObjectId(id));
    }
    else{
        queryObject = (DBObject) JSON.parse(query);
    }

    LOG.debug("Executing MongoQuery: " + query.toString());

    long start = System.currentTimeMillis();
    mongoCursor = this.mongoCollection.find(queryObject);
    LOG.trace("Time taken for mongo :"
            + (System.currentTimeMillis() - start));

    ResultSetIterator resultSet = new ResultSetIterator(mongoCursor);
    return resultSet.getIterator();
}

After these changes you can build the jar using ant.

Copy the jars (solr mongo importer and the mongo-java-driver) into the lib directory. I copied them into ${solr-install-dir}/contrib/dataimport-handler/lib

Add the lib directives in solr-config.xml for the above jars:

<lib dir="${solr.install.dir:../../../..}/contrib/dataimporthandler/lib" regex=".*\.jar" />

Finally, here's an example of the mongo collections and data-config.xml

User collection
{
    "_id" : ObjectId("56e9c892e4b0355017b2fa0f"),
    "name" : "User1",
    "phone" : "123456789"
}

Address collection
{
    "_id" : ObjectId("56e9c892e4b0355017b2fa0f"),
    "address" : "#666, Maiden street"
}

data-config.xml

Do not forget to mention objectIdToString="true" for the _id field so that the MongoMapperTransformer can stringify the id.

<dataConfig>
   <dataSource name="MyMongo"
           type="MongoDataSource"
           database="test"
            />
   <document name="UserDetails">
   <!-- if query="" then it imports everything -->
      <entity name="users"
          processor="MongoEntityProcessor"
          query=""
          collection="user"
          datasource="MyMongo"
          transformer="MongoMapperTransformer">
              <field column="_id"  name="id" mongoField="_id" objectIdToString="true" />
              <field column="phone" name="phone" mongoField="phone"/>

          <entity name="address"
                processor="MongoEntityProcessor"
                query="{_id:'${users._id}'}"
                collection="address"
                datasource="MyMongo"
                transformer="MongoMapperTransformer">
                <field column="address" name="adress" mongoField="address"/>
          </entity>
   </entity>
   </document>
</dataConfig>

The managed-schema will have the id field as string. Also, if you have nested objects in mongodb you will have to use script transformers to index them in solr.

Hope this helps, Good luck !

like image 146
Pruthvik Narayanaswamy Avatar answered Sep 22 '22 15:09

Pruthvik Narayanaswamy


According to the error message,

You need to implement JavaBinCodec.ObjectResolver for org.bson.types.ObjectId type, so Solr will know how to serialize instances of this class.

JavaBinCodec.ObjectResolver Documentation

public static interface JavaBinCodec.ObjectResolver Allows extension of JavaBinCodec to support serialization of arbitrary data types. Implementors of this interface write a method to serialize a given object using an existing JavaBinCodec

Once you write your JavaBinCodec.ObjectResolver implementation you should register it using JavaBinCodec

JavaBinCodec Documentation

public class JavaBinCodec extends Object Defines a space-efficient serialization/deserialization format for transferring data. JavaBinCodec has built in support many commonly used types. This includes primitive types (boolean, byte, short, double, int, long, float), common Java containers/utilities (Date, Map, Collection, Iterator, String, Object[], byte[]), and frequently used Solr types (NamedList, SolrDocument, SolrDocumentList). Each of the above types has a pair of associated methods which read and write that type to a stream.

Classes that aren't supported natively can still be serialized/deserialized by providing an JavaBinCodec.ObjectResolver object that knows how to work with the unsupported class. This allows JavaBinCodec to be used to marshall/unmarshall arbitrary content.

NOTE -- JavaBinCodec instances cannot be reused for more than one marshall or unmarshall operation.

like image 21
Kerem Baydoğan Avatar answered Sep 24 '22 15:09

Kerem Baydoğan