Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory efficency of clearing a HashSet vs. creating a new HashSet

Curiosity and efficiency are the reasons for this question. I am in a situation where I am creating many new HashSets after certain loops run:

The HashSet is currently declared as such at the top of the class:

private Set<String> failedTests;

Then later in the code, I just create a new failedTests HashSet whenever I am re-running the tests:

failedTests = new HashSet<String>(16384);

I do this over and over, depending on the size of the test. I expect the garbage collector to most efficiently handle the old data. But, I know another option would be to create the HashSet initially in the beginning:

private Set<String> failedTests = new HashSet<String>(16384);

and then clear the HashSet each time through the loop.

failedTests.clear();

My question is which is the most efficient way of doing this in terms of overhead, etc? I don't know what the clear() function is doing inside -- is it doing the same thing, sending the old data to the garbage collection, or is it doing something even more efficient? Also, I am giving the HashSet a large cushion of initial capacity, but if a test requires more than 2^14 elements, will the .clear() function re-instantiate the HashSet to 16384?

To add, I found the source code to clear() here. So it is at least an O(n) operation of the worst case.

Using the clear function, I did a test process which finished in 565 seconds. Using the GC to handle it, the test finished in 506 seconds.

But its not a perfect benchmark because there are other external factors such as interfacing with the computer's and network's file system. But a full minute does feel pretty good indeed. Does anyone recommend a specific profiling system that will work on the line/method level? (I am using Eclipse Indigo)

like image 680
E.S. Avatar asked Jun 17 '13 19:06

E.S.


1 Answers

I don't know what the clear() function is doing inside

It is calling the clear() method of HashMap table that it is using internally. Within HashMap the clear() method is defined as follows:

public void clear() {
  modCount++;
  Entry[] tab = table;
  for (int i = 0; i < tab.length; i++)
      tab[i] = null;
  size = 0;
}

is it doing the same thing, sending the old data to the garbage collection, or is it doing something even more efficient?

tab[i] = null points out that it is making the old data eligible for the garbage collection.

Also, I am giving the HashSet a large cushion of initial capacity, but if a test requires more than 2^14 elements, will the .clear() function re-instantiate the HashSet to 16384?

No, It won't.

which is the most efficient way of doing this in terms of overhead, etc?

I guess, Java Garbage collector knows how to do its work in most efficient way. So let the garbage collector to take care of this. So, I would prefer creating a new failedTests HashSet each time it is needed.

like image 84
Vishal K Avatar answered Sep 20 '22 14:09

Vishal K