Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reliable method of cleaning up an external resource associated with an Object

Concrete use case: There is an abstraction for binary data, which is widely used to handle binary blobs of arbitrary size. Since the abstraction was created without though about things outside the VM, existing implementations rely on the garbage collector for their life cycle.

Now I want to add a new implementation that uses off-heap storage (e.g. in a temporary file). Since there is a lot of existing code that uses the abstraction, introducing additional methods for explicit life cycle management is impractical, I can't rewrite every client use case using to ensure they manage the new life cycle requirements.

I can think of two solution approaches, but cant decide which one is better:

a.) Use of finalize() to manage the associated resource's life cycle (e.g. temporary file is deleted in finalize. This seems very simple to implement.

b.) Use of a reference queue and java.lang.Reference (but which one, weak or phantom?) with some extra object that deletes the file when the reference is enqueued. This seems to be a bit more work to implement, I would need to create not only the new implementation, but separate out its cleanup data and ensure the cleanup object can't be GC'd before the object that has been exposed to the user.

c.) Some other method I haven't though of?

Which approach should I take (and why should I prefer it)? Implementation hints are also welcome.


Edit: Degree of reliaility required - for my purpose its perfectly fine if a temporary file is not cleaned up in case the VM terminated abruptly. The main concern is that while the VM runs, it could very well fill up the local disk (over the course of a few days) with temporary files (this has happened to me for real with apache TIKA, which created temporary files when extracting text from certain document types, zip files were the culprit I believe). I have a periodic cleanup scheduled on the machine, so if a file drops by cleanup it doesn't mean the end of the world - as long as it doesn't happen regularly in a short interval.

As far as I could determine finalize() works with the Oracale JRE. And if I interpret the javadocs correctly, References must work as documented (there is no way a only softly/weakly reachable reference object is not cleared before OutOfMemoryError is thrown). This would mean while the VM may decide not to reclaim a particular object for a long time, it has to do so latest when the heap gets full. In turn this means there can exist only a limited number of my file based blobs on the heap. The VM has to clean them up at some point, or it would definetly run out of memory. Or is there any loophole that allows the VM to run OOM without clearing references (assuming they aren't stronly refered anymore)?


Edit2: As far as I see it at this point both finalize() and Reference should be reliable enough for my purposes, but I gather Reference may be the better solution since its interaction with the GC can't revive dead objects and thus its performance impact should be less?


Edit3: Solution approaches which rely on VM termination or startup (shutdown hook or similar) are not of use to me, since typically the VM runs for extended periods of time (server environment).

like image 750
Durandal Avatar asked Sep 12 '12 17:09

Durandal


2 Answers

Here's a relevant item from Effective Java: Avoid finalizers

Contained within that item is a recommendation to do just what @delnan suggests in a comment: provide an explicit termination method. Plenty of examples provided as well: InputStream.close(), Graphics.dispose(), etc. Understand that the cows may have already left the barn on that one...

At any rate, here's a sketch of how this might be accomplished with reference objects. First, an interface for binary data:

import java.io.IOException;

public interface Blob {
    public byte[] read() throws IOException;
    public void update(byte[] data) throws IOException;
}

Next, a file-based implementation:

import java.io.File;
import java.io.IOException;

public class FileBlob implements Blob {

    private final File file;

    public FileBlob(File file) {
        super();
        this.file = file;
    }

    @Override
    public byte[] read() throws IOException {
        throw new UnsupportedOperationException();
    }

    @Override
    public void update(byte[] data) throws IOException {
        throw new UnsupportedOperationException();
    }
}

Then, a factory to create and track the file-based blobs:

import java.io.File;
import java.io.IOException;
import java.lang.ref.PhantomReference;
import java.lang.ref.Reference;
import java.lang.ref.ReferenceQueue;
import java.util.Timer;
import java.util.TimerTask;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;

public class FileBlobFactory {

    private static final long TIMER_PERIOD_MS = 10000;

    private final ReferenceQueue<File> queue;
    private final ConcurrentMap<PhantomReference<File>, String> refs;
    private final Timer reaperTimer;

    public FileBlobFactory() {
        super();
        this.queue = new ReferenceQueue<File>();
        this.refs = new ConcurrentHashMap<PhantomReference<File>, String>();
        this.reaperTimer = new Timer("FileBlob reaper timer", true);
        this.reaperTimer.scheduleAtFixedRate(new FileBlobReaper(), TIMER_PERIOD_MS, TIMER_PERIOD_MS);
    }

    public Blob create() throws IOException {
        File blobFile = File.createTempFile("blob", null);
        //blobFile.deleteOnExit();
        String blobFilePath = blobFile.getCanonicalPath();
        FileBlob blob = new FileBlob(blobFile);
        this.refs.put(new PhantomReference<File>(blobFile, this.queue), blobFilePath);
        return blob;
    }

    public void shutdown() {
        this.reaperTimer.cancel();
    }

    private class FileBlobReaper extends TimerTask {
        @Override
        public void run() {
            System.out.println("FileBlob reaper task begin");
            Reference<? extends File> ref = FileBlobFactory.this.queue.poll();
            while (ref != null) {
                String blobFilePath = FileBlobFactory.this.refs.remove(ref);
                File blobFile = new File(blobFilePath);
                boolean isDeleted = blobFile.delete();
                System.out.println("FileBlob reaper deleted " + blobFile + ": " + isDeleted);
                ref = FileBlobFactory.this.queue.poll();
            }
            System.out.println("FileBlob reaper task end");
        }
    }
}

Finally, a test that includes some artificial GC "pressure" to get things going:

import java.io.IOException;

public class FileBlobTest {

    public static void main(String[] args) {
        FileBlobFactory factory = new FileBlobFactory();
        for (int i = 0; i < 10; i++) {
            try {
                factory.create();
            } catch (IOException exc) {
                exc.printStackTrace();
            }
        }

        while(true) {
            try {
                Thread.sleep(5000);
                System.gc(); System.gc(); System.gc();
            } catch (InterruptedException exc) {
                exc.printStackTrace();
                System.exit(1);
            }
        }
    }
}

Which should produce some output like:

FileBlob reaper task begin
FileBlob reaper deleted C:\WINDOWS\Temp\blob1055430495823649476.tmp: true
FileBlob reaper deleted C:\WINDOWS\Temp\blob873625122345395275.tmp: true
FileBlob reaper deleted C:\WINDOWS\Temp\blob4123088770942737465.tmp: true
FileBlob reaper deleted C:\WINDOWS\Temp\blob1631534546278785404.tmp: true
FileBlob reaper deleted C:\WINDOWS\Temp\blob6150533076250997032.tmp: true
FileBlob reaper deleted C:\WINDOWS\Temp\blob7075872276085608840.tmp: true
FileBlob reaper deleted C:\WINDOWS\Temp\blob5998579368597938203.tmp: true
FileBlob reaper deleted C:\WINDOWS\Temp\blob3779536278201681316.tmp: true
FileBlob reaper deleted C:\WINDOWS\Temp\blob8720399798060613253.tmp: true
FileBlob reaper deleted C:\WINDOWS\Temp\blob3046359448721598425.tmp: true
FileBlob reaper task end
like image 130
kschneid Avatar answered Nov 15 '22 22:11

kschneid


This is the solution I cooked up after kschneids reference based example (just in case someone needs a generically usable implementation). Its documented and should be easy to understand/adapt:

import java.lang.ref.PhantomReference;
import java.lang.ref.Reference;
import java.lang.ref.ReferenceQueue;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;

/**
 * Helper class for cleaning up resources when an object is
 * garbage collected. Use as follows (both anonymous subclass or
 * public subclass are fine. Be extra careful to not retain
 * a reference to the trigger!):
 * 
 * new ResourceFinalizer(trigger) {
 * 
 *     // put user defined state relevant for cleanup here
 *     
 *     protected void cleanup() {
 *         // implement cleanup procedure.
 *     }
 * }
 *
 * Typical application is closing of native resources when an object
 * is garbage collected (e.g. VM external resources).
 * 
 * You must not retain any references from the ResourceFinalizer to the
 * trigger (otherwise the trigger can never become eligible for GC).
 * You can however retain references to the ResourceFinalizer from the
 * trigger, so you can access the data relevant for the finalizer
 * from the trigger (no need to duplicate the data).
 * There is no need to explicitly reference the finalizer after it has
 * been created, the finalizer base class will ensure the finalizer
 * itself is not eligible for GC until it has been run.
 * 
 * When the VM terminates, ResourceFinalizer that haven't been
 * triggered will run, regardless of the state of their triggers
 * (that is even if the triggers are still reachable, the finalizer
 * will be called). There are no guarantees on this, if the VM
 * is terminated abruptly this step may not take place.
 */
public abstract class ResourceFinalizer {

    /**
     * Constructs a ResourceFinalizer that is triggered when the
     * object referenced by finalizationTrigger is garbage collected.
     * 
     * To make this work, you must ensure there are no references to
     * the finalizationTrigger object from the ResourceFinalizer.
     */
    protected ResourceFinalizer(final Object trigger) {
        // create reference to trigger and register this finalizer
        final Reference<Object> reference = new PhantomReference<Object>(trigger, referenceQueue);
        synchronized (finalizerMap) {
            finalizerMap.put(reference, this);
        }
    }

    /**
     * The cleanup() method is called when the finalizationTrigger
     * has been garbage collected.
     */
    protected abstract void cleanup();

    // --------------------------------------------------------------
    // ---
    // --- Background finalization management
    // ---
    // --------------------------------------------------------------

    /**
     * The reference queue used to interact with the garbage collector.
     */
    private final static ReferenceQueue<Object> referenceQueue = new ReferenceQueue<Object>();

    /**
     * Global static map of finalizers. Enqueued references are used as key
     * to find the finalizer for the referent.
     */
    private final static HashMap<Reference<?>, ResourceFinalizer> finalizerMap =
            new HashMap<Reference<?>, ResourceFinalizer>(16, 2F);

    static {
        // create and start finalizer thread
        final Thread mainLoop = new Thread(new Runnable() {
            @Override
            public void run() {
                finalizerMainLoop();
            }
        }, "ResourceFinalizer");
        mainLoop.setDaemon(true);
        mainLoop.setPriority(Thread.NORM_PRIORITY + 1);
        mainLoop.start();

        // add a shutdown hook to take care of resources when the VM terminates
        final Thread shutdownHook = new Thread(new Runnable() {
            @Override
            public void run() {
                shutdownHook();
            }
        });
        Runtime.getRuntime().addShutdownHook(shutdownHook);
    }

    /**
     * Main loop that runs permanently and executes the finalizers for
     * each object that has been garbage collected. 
     */
    private static void finalizerMainLoop() {
        while (true) {
            final Reference<?> reference;
            try {
                reference = referenceQueue.remove();
            } catch (final InterruptedException e) {
                // this will terminate the thread, should never happen
                throw new RuntimeException(e);
            }
            final ResourceFinalizer finalizer;
            // find the finalizer for the reference
            synchronized (finalizerMap) {
                finalizer = finalizerMap.remove(reference);
            }
            // run the finalizer
            callFinalizer(finalizer);
        }
    }

    /**
     * Called when the VM shuts down normally. Takes care of calling
     * all finalizers that haven't been triggered yet.
     */
    private static void shutdownHook() {
        // get all remaining resource finalizers
        final List<ResourceFinalizer> remaining;
        synchronized (finalizerMap) {
            remaining = new ArrayList<ResourceFinalizer>(finalizerMap.values());
            finalizerMap.clear();
        }
        // call all remaining finalizers
        for (final ResourceFinalizer finalizer : remaining) {
            callFinalizer(finalizer);
        }
    }

    private static void callFinalizer(final ResourceFinalizer finalizer) {
        try {
            finalizer.cleanup();
        } catch (final Exception e) {
            // don't care if a finalizer throws
        }
    }

}
like image 33
Durandal Avatar answered Nov 15 '22 21:11

Durandal