Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Most efficient way to count occurrences?

I've got an array of bytes (primitive), they can have random values. I'm trying to count occurrences of them in the array in the most efficient/fastest way. Currently I'm using:

HashMap<Byte, Integer> dataCount = new HashMap<>();
for (byte b : data) dataCount.put(b, dataCount.getOrDefault(b, 0) + 1);

This one-liner takes ~500ms to process a byte[] of length 24883200. Using a regular for loop takes at least 600ms.

I've been thinking of constructing a set (since they only contain one of each element) then adding it to a HashMap using Collections.frequency(), but the methods to construct a Set from primitives require several other calls, so I'm guessing it's not as fast.

What would be the fastest way to accomplish counting of occurrences of each item?

I'm using Java 8 and I'd prefer to avoid using Apache Commons if possible.

like image 771
user_4685247 Avatar asked May 06 '15 17:05

user_4685247


People also ask

How do you count the number of occurrences in a data frame?

Using the size() or count() method with pandas. DataFrame. groupby() will generate the count of a number of occurrences of data present in a particular column of the dataframe.

How do you count how many times a value appears in a list?

Count how often a single value occurs by using the COUNTIF function. Use the COUNTIF function to count how many times a particular value appears in a range of cells.

How do you count the occurrence of a specific object in a list?

Method 2: Count occurrences of an element in a list Using count() The idea is to use the list method count() to count the number of occurrences.


2 Answers

If it's just bytes, use an array, don't use a map. You do have to use masking to deal with the signedness of bytes, but that's not a big deal.

int[] counts = new int[256];
for (byte b : data) {
   counts[b & 0xFF]++;
}

Arrays are just so massively compact and efficient that they're almost impossible to beat when you can use them.

like image 188
Louis Wasserman Avatar answered Sep 22 '22 13:09

Louis Wasserman


I would create an array instead of a HashMap, given that you know exactly how many counts you need to keep track of:

int[] counts = new int[256];
for (byte b : data) {
    counts[b & 0xff]++;
}

That way:

  • You never need to do any boxing of either the keys or the values
  • Nothing needs to take a hash code, check for equality etc
  • It's about as memory-efficient as it gets

Note that the & 0xff is used to get a value in the range [0, 255] instead of [-128, 127], so it's suitable as the index into the array.

like image 22
Jon Skeet Avatar answered Sep 24 '22 13:09

Jon Skeet