Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does mongodb seem to save some binary objects and not others?

I'm not sure where to start or what information is relevant please let me know what additional information may be useful in solving this problem.

I am developing a simple cometd application and I'm using mongodb as my storage backend. I obtain a single mongodb instance when the application starts and I use this instance for all queries. This is in fact recommended by the mongo java driver documentation as stated here: http://www.mongodb.org/display/DOCS/Java+Driver+Concurrency. I was grasping at straws thinking that the issue had something to do with thread safety but according to that link mongodb is completely thread safe.

Here's where it gets interesting. I have a class that extends BasicDBObject.

public class MyBasicDBObject {

    private static final String MAP = "map";

    public boolean updateMapAnd(String submap, String key, byte[] value) {
         Map topMap = (Map)this.get(MAP);
         Map embeddedMap = topMap.get(submap);
         byte[] oldValue = embeddedMap.get(key);

         newValue = UtilityClass.binaryAnd(oldValue, value);

         embeddedMap.put(key, newValue);
         topMap.put(submap, embeddedMap);
         this.put(MAP, topMap);
    }

    public boolean updateMapXor(String submap, String key, byte[] value) {
         Map topMap = (Map)this.get(MAP);
         Map embeddedMap = topMap.get(submap);
         byte[] oldValue = embeddedMap.get(key);

         newValue = UtilityClass.binaryXor(oldValue, value);

         embeddedMap.put(key, newValue);
         topMap.put(submap, embeddedMap);
         this.put(MAP, topMap);
    }
}

Next two skeleton classes that extend MyBasicDBObject.

public class FirstDBObject extends MyBasicDBObject { //no code }

public class SecondDBObject extends MyBasicDBObject { //no code }

The only reason I've set up my classes this way is to improve code readability in dealing with these two objects within the same scope. This lets me do the following...

//a cometd service callback
public void updateMapObjectsFoo(ServerSession remote, Message message) {

    //locate the objects to update...
    FirstDBObject first = (FirstDBObject) firstCollection.findOne({ ... });
    SecondDBObject second = (SecondDBObject) secondCollection.findOne({ ... });

    //update them as follows
    first.updateMapAnd("default", "someKey1", newBinaryData1);
    second.updateMapAnd("default", "someKey2", newBinaryData2);

    //save (update) them to their respective collections
    firstCollection.save(first);
    secondCollection.save(second);
}

public void updateMapObjectsBar(ServerSession remote, Message message) {

    //locate the objects to update...
    FirstDBObject first = (FirstDBObject) firstCollection.findOne({ ... });
    SecondDBObject second = (SecondDBObject) secondCollection.findOne({ ... });

    /** 
     * the only difference is these two calls 
     */
    first.updateMapXor("default", "someKey1", newBinaryData1);
    second.updateMapXor("default", "someKey2", newBinaryData2);

    //save (update) them to their respective collections
    firstCollection.save(first);
    secondCollection.save(second);
}

The UtilityClass does exactly as the methods are named, bitwise & and bitwise ^ by iterating over the passed byte arrays.

This is where I'm totally lost. updateMapObjectsFoo() works exactly as expected, both first and second reflect the changes in the database. updateMapObjectsBar() on the other hand only manages to properly update first.

Inspection via debugging updateMapObjectsBar() shows that the binary objects are in fact updated properly on both objects, but when I head over to the mongo shell to investigate the problem I see that first is updated in the DB and second is not. Where did I get the idea that thread safety had anything to do with it? The only difference that bugs me is that secondCollection is used by other cometd services while firstCollection is not. That seems relevant in one hand, but not in the other since Foo works and Bar does not.

I have torn the code apart and put it back together and I keep coming back to this same problem. What in the world is going on here?

It seems I left out the most relevant part of all which is the nightmare of java generics and the mongodb driver's reliance on this feature of the language. BasicDBObject is essentially a wrapper for a Map<String, Object>. The problem is that once you store an object in that map, you must cast it back to what it was when you put it in there. Yes that may seem completely obvious, and I knew that full well before posting this question.

I cannot pinpoint what happened exactly but I will offer this advice to java + mongodb users. You will be casting, A LOT, and the more complicated your data structures the more casts you will need. Long story short, don't do this:

DBObject obj = (DBObject) collection.findOne(new BasicDBObject("_id", new ObjectId((String)anotherObj.get("objId"))));

One liners are tempting when you are doing rapid prototypes but when you start doing that over and over you are bound to make mistakes. Write more code now, and suffer less frustration later:

DBObject query = new DBObject();
String objId = (String) anotherObj.get("objId");
query.put("_id", new ObjectId(objId));
obj = (DBObject) collection.findOne(query);

I think this is annoyingly verbose but I should expect as much interacting directly with Mongo instead of using some kind of library to make my life easier. I have made a fool of myself on this one, but hopefully someone will learn from my mistake and save themselves a lot of frustration.

Thanks to all for your help.

like image 890
Alex W Avatar asked Sep 04 '12 02:09

Alex W


People also ask

Can MongoDB store binary data?

MongoDB stores objects in a binary format called BSON. BinData is a BSON data type for a binary byte array. However, MongoDB objects are typically limited to 16MB in size. To deal with this, files are "chunked" into multiple objects that are less than 255 KiB each.

What is the problem with MongoDB?

Many times, these databases are housed remotely and require access from remote administrators. The default port for a MongoDB is 27017 and, if it's not only bound to 127.0. 0.1, and the firewalls aren't locked down, attackers can access the database remotely.

Can MongoDB store objects?

MongoDB is also very flexible because it's a NoSQL database, so objects can be stored together even if they have a completely different structure.

Can MongoDB store blob?

In MongoDB, you can use the BSON binary type to store any kind of binary data. This data type corresponds to the RDBMS BLOB (binary large object) type, and it's the basis for two flavors of binary object storage provided by MongoDB. The first uses one document per file and is best for smaller binary objects.


1 Answers

It could very easily be a multi-threading issue. While you are correct that the Mongo, DB, and DBCollection objects are threadsafe if there is only one Mongo instance, DBObjects are not threadsafe. But even if they were threadsafe, your updateMapObjectsFoo/Bar methods do nothing to ensure that they are atomic operations on the database.

Unfortunately, the changes you would need to make to your code are more intense than just sprinkling a few "synchronized" keywords around. See if http://www.mongodb.org/display/DOCS/Atomic+Operations doesn't help you understand the scope of the problem and some potential solutions.

like image 106
Fuwjax Avatar answered Nov 07 '22 13:11

Fuwjax