Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Prevent mongodb from dying with 'state should be: open'

I'm using mongodb in a multithreaded clojure app, using the monger library, and one of my producer threads are dying with

java.lang.IllegalStateException: state should be: open
 at com.mongodb.assertions.Assertions.isTrue (Assertions.java:70)
    com.mongodb.connection.DefaultServer.getConnection (DefaultServer.java:84)
    com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.getConnection (ClusterBinding.java:86)
    com.mongodb.operation.QueryBatchCursor.getMore (QueryBatchCursor.java:205)
    com.mongodb.operation.QueryBatchCursor.hasNext (QueryBatchCursor.java:103)
    com.mongodb.MongoBatchCursorAdapter.hasNext (MongoBatchCursorAdapter.java:46)
    com.mongodb.DBCursor.hasNext (DBCursor.java:155)
    clojure.lang.RT$4.invoke (RT.java:512)
    clojure.lang.LazySeq.sval (LazySeq.java:40)
    clojure.lang.LazySeq.seq (LazySeq.java:49)
    clojure.lang.RT.seq (RT.java:525)
    clojure.core$seq__6416.invokeStatic (core.clj:137)
    clojure.core$map$fn__6875.invoke (core.clj:2719)
    clojure.lang.LazySeq.sval (LazySeq.java:40)
    clojure.lang.LazySeq.seq (LazySeq.java:49)
    clojure.lang.RT.seq (RT.java:525)
    clojure.core$seq__6416.invokeStatic (core.clj:137)
    clojure.core$map$fn__6875.invoke (core.clj:2719)
    clojure.lang.LazySeq.sval (LazySeq.java:40)
    clojure.lang.LazySeq.seq (LazySeq.java:49)
    clojure.lang.RT.seq (RT.java:525)
    clojure.core$seq__6416.invokeStatic (core.clj:137)
    clojure.core$filter$fn__6902.invoke (core.clj:2782)
    clojure.lang.LazySeq.sval (LazySeq.java:40)
    clojure.lang.LazySeq.seq (LazySeq.java:49)
    clojure.lang.ChunkedCons.chunkedNext (ChunkedCons.java:59)
    clojure.lang.ChunkedCons.next (ChunkedCons.java:43)
    clojure.lang.RT.next (RT.java:703)
    clojure.core$next__6400.invokeStatic (core.clj:64)
    clojure.core$dorun.invokeStatic (core.clj:3115)
    clojure.core$doall.invokeStatic (core.clj:3121)
    clojure.core$doall.invoke (core.clj:3121)
    myapp.ns1.$somefn.invokeStatic (ns1.clj:93)
    myapp.ns1.$somefn.invoke (ns1.clj:90)
    myapp.ns1$anotherfn.invokeStatic (ns1.clj:124)
    myapp.ns1$anotherfn.invoke (ns1.clj:116)
    myapp.ns2$doit.invokeStatic (ns2:21)
    myapp.ns2$doit.invoke (ns2:17)
    myapp.ns2$producer$fn__11200.invoke (ns2:45)
    myapp.ns2$producer.invokeStatic (ns2:31)
    myapp.ns2$producer.invoke (ns2:25)
    myapp.ns2$_start$fn__11230.invoke (ns2:70)
    clojure.core$binding_conveyor_fn$fn__6766.invoke (core.clj:2020)
    clojure.lang.AFn.call (AFn.java:18)
    java.util.concurrent.FutureTask.run (FutureTask.java:266)
    java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1142)
    java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:617)
    java.lang.Thread.run (Thread.java:745)

I've found a bunch of other hits for this problem and they were all solved by removing some conn.close() call somewhere.

I have a single connection which I create on startup and the only place I call close is during shutdown. The java driver manages a threadpool, so I'm not entirely sure what connection we're talking about either. Does the DbObject returned from queries have its own dedicated connection, and it is this connection that is dying?

I've tried fixing that by specifying :socket-keep-alive true, and explicitly setting :socket-timeout to 0 (which is the default and means unlimited) to no avail.

In monger there's some usage of with-open which I figured might cause the problem I'm having. On the off chance that there's some connection associated with the db object, being passed in here, which gets closed, I've tried removing all re-use of db objects, but that had no effect.

Another thought was thatwith-open might interact badly with the lazy stuff within, but wrapping everything in a doall to make it eager didn't have any effect either.

I'm running against a replica set, and I'm running locally on the slave mongodb with ReadPreference/secondary.

Any other ideas as to what might be wrong?

like image 266
expez Avatar asked Nov 02 '16 17:11

expez


1 Answers

After peeling away layers of laziness in my app the exception changed to something like "DB cursor not found". At that point it was obvious what was wrong and by managing my own cursor with notimeout, instead of using monger, the random errors disappeared.

like image 170
expez Avatar answered Oct 11 '22 11:10

expez