Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does the MongoDB Java driver use a random number generator in a conditional?

People also ask

Is MongoDB based on Java?

Overview. In this article, we'll have a look at integrating MongoDB, a very popular NoSQL open source database with a standalone Java client. MongoDB is written in C++ and has quite a number of solid features such as map-reduce, auto-sharding, replication, high availability etc.

Is MongoDB synchronous or asynchronous?

The MongoDB Async driver provides an asynchronous API that can leverage either Netty or Java 7's AsynchronousSocketChannel for fast and non-blocking IO.


After inspecting the history of that line, my main conclusion is that there has been some incompetent programming at work.

  1. That line is gratuitously convoluted. The general form

    a? true : b
    

    for boolean a, b is equivalent to the simple

    a || b
    
  2. The surrounding negation and excessive parentheses convolute things further. Keeping in mind De Morgan's laws it is a trivial observation that this piece of code amounts to

    if (!_ok && Math.random() <= 0.1)
      return res;
    
  3. The commit that originally introduced this logic had

    if (_ok == true) {
      _logger.log( Level.WARNING , "Server seen down: " + _addr, e );
    } else if (Math.random() < 0.1) {
      _logger.log( Level.WARNING , "Server seen down: " + _addr );
    }
    

    —another example of incompetent coding, but notice the reversed logic: here the event is logged if either _ok or in 10% of other cases, whereas the code in 2. returns 10% of the times and logs 90% of the times. So the later commit ruined not only clarity, but correctness itself.

    I think in the code you have posted we can actually see how the author intended to transform the original if-then somehow literally into its negation required for the early return condition. But then he messed up and inserted an effective "double negative" by reversing the inequality sign.

  4. Coding style issues aside, stochastic logging is quite a dubious practice all by itself, especially since the log entry does not document its own peculiar behavior. The intention is, obviously, reducing restatements of the same fact: that the server is currently down. The appropriate solution is to log only changes of the server state, and not each its observation, let alone a random selection of 10% such observations. Yes, that takes just a little bit more effort, so let's see some.

I can only hope that all this evidence of incompetence, accumulated from inspecting just three lines of code, does not speak fairly of the project as a whole, and that this piece of work will be cleaned up ASAP.


https://github.com/mongodb/mongo-java-driver/commit/d51b3648a8e1bf1a7b7886b7ceb343064c9e2225#commitcomment-3315694

11 hours ago by gareth-rees:

Presumably the idea is to log only about 1/10 of the server failures (and so avoid massively spamming the log), without incurring the cost of maintaining a counter or timer. (But surely maintaining a timer would be affordable?)


Add a class member initialized to negative 1:

  private int logit = -1;

In the try block, make the test:

 if( !ok && (logit = (logit + 1 ) % 10)  == 0 ) { //log error

This always logs the first error, then every tenth subsequent error. Logical operators "short-circuit", so logit only gets incremented on an actual error.

If you want the first and tenth of all errors, regardless of the connection, make logit class static instead of a a member.

As had been noted this should be thread safe:

private synchronized int getLogit() {
   return (logit = (logit + 1 ) % 10);
}

In the try block, make the test:

 if( !ok && getLogit() == 0 ) { //log error

Note: I don't think throwing out 90% of the errors is a good idea.


I have seen this kind of thing before.

There was a piece of code that could answer certain 'questions' that came from another 'black box' piece of code. In the case it could not answer them, it would forward them to another piece of 'black box' code that was really slow.

So sometimes previously unseen new 'questions' would show up, and they would show up in a batch, like 100 of them in a row.

The programmer was happy with how the program was working, but he wanted some way of maybe improving the software in the future, if possible new questions were discovered.

So, the solution was to log unknown questions, but as it turned out, there were 1000's of different ones. The logs got too big, and there was no benefit of speeding these up, since they had no obvious answers. But every once in a while, a batch of questions would show up that could be answered.

Since the logs were getting too big, and the logging was getting in the way of logging the real important things he got to this solution:

Only log a random 5%, this will clean up the logs, whilst in the long run still showing what questions/answers could be added.

So, if an unknown event occurred, in a random amount of these cases, it would be logged.

I think this is similar to what you are seeing here.

I did not like this way of working, so I removed this piece of code, and just logged these messages to a different file, so they were all present, but not clobbering the general logfile.