Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory leak using TensorFlow for Java

The following test code leaks memory:

private static final float[] X = new float[]{1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0};

public void testTensorFlowMemory() {
    // create a graph and session
    try (Graph g = new Graph(); Session s = new Session(g)) {
        // create a placeholder x and a const for the dimension to do a cumulative sum along
        Output x = g.opBuilder("Placeholder", "x").setAttr("dtype", DataType.FLOAT).build().output(0);
        Output dims = g.opBuilder("Const", "dims").setAttr("dtype", DataType.INT32).setAttr("value", Tensor.create(0)).build().output(0);
        Output y = g.opBuilder("Cumsum", "y").addInput(x).addInput(dims).build().output(0);
        // loop a bunch to test memory usage
        for (int i=0; i<10000000; i++){
            // create a tensor from X
            Tensor tx = Tensor.create(X);
            // run the graph and fetch the resulting y tensor
            Tensor ty = s.runner().feed("x", tx).fetch("y").run().get(0);
            // close the tensors to release their resources
            tx.close();
            ty.close();
        }

        System.out.println("non-threaded test finished");
    }
}

Is there something obvious I'm doing wrong? The basic flow is to create a graph and a session on that graph, create a placeholder and a constant in order to do a cumulative sum on a tensor fed in as x. After running the resulting y operation, I close both the x and y tensors to free their memory resources.

Things I believe so far to help:

  • This is not a Java objects memory problem. The heap does not grow, other memory in the JVM is not growing- according to jvisualvm. Doesn't appear to be a JVM memory leak according to Java's Native Memory Tracking.
  • The close operations are helping, if they're not there the memory grows by leaps and bounds. With them in place it still grows pretty fast, but nearly as much as without them.
  • The cumsum operator is not important, it happens with sum and other operators as well
  • It happens on Mac OS with TF 1.1, and CentOS 7 with TF 1.1 and 1.2_rc0
  • Commenting out the Tensor ty lines removes the leak, so it appears to be in there.

Any ideas? Thanks! Also, here's a Github project that demonstrates this issue with both a threaded test (to grow the memory faster) and an unthreaded test (to show it's not due to threading). It uses maven and can be run with simple:

mvn test
like image 312
Jesse Pangburn Avatar asked May 20 '17 05:05

Jesse Pangburn


1 Answers

I believe there is indeed a leak (in particular a missing TF_DeleteStatus corresponding to the allocation in JNI code) (Thanks for the detailed instructions to reproduce)

I'd encourage you to file an issue at http://github.com/tensorflow/tensorflow/issues and hopefully it should be fixed before the final 1.2 release.

(Relatedly, you also have a leak outside the loop since the Tensor object created by Tensor.create(0) is not being closed)

UPDATE: This was fixed and 1.2.0-rc1 should no longer have this problem.

like image 60
ash Avatar answered Oct 09 '22 07:10

ash