Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I iterate over a large Java List of Integers in less time?

int[] records = job.getTargetSearchIDs();
topology.applyMatcherSearchWeight(records);
int[] mIDs = topology.getMatcherIds();
SystemResponse[] sysResponse = new SystemResponse[mIDs.length];
Map<Integer, SearchCommand> mrCmdsMap = new HashMap<Integer, SearchCommand>();

The length of mIDs is 250 and the length of records is 7.5 million integers. I want this loop to run in less than 3 seconds on a server with an 8-core Intel Xeon X5355 processor, 64-bit Linux (Ubuntu) and 32-bit Java.

for (long mID : mIDs) {
  List<Integer> recIDsToMatch = new LinkedList<Integer>();
  Matcher matcher = topology.getMatcherById(mID);

  for (long record : records) {
    if (matcher.getRange().isInRange(record))
      recIDsToMatch.add(record);
  }

  if (recIDsToMatch.size() > 0) {
    SearchCommand command = new SearchCommand(job.getMatchParameters(), 
      job.getRequestType(),
      job.getId(),
      job.getMatchParameters().getEngineProperties(),
      recIDsToMatch);

    command.setTimeout(searchTimeout, TimeUnit.SECONDS);
    mrCmdsMap.put(mID, command);
  }
}

What improvements come to mind when you read this code snippet? What data structure and/or algorithm improvements could be made?

like image 339
user861271 Avatar asked Jul 25 '11 11:07

user861271


People also ask

How do I efficiently iterate over each entry in a Java list?

forEach() Since Java 8, we can use the forEach() method to iterate over the elements of a list. This method is defined in the Iterable interface, and can accept Lambda expressions as a parameter.

Which for loop is faster in Java?

Iterator and for-each loop are faster than simple for loop for collections with no random access, while in collections which allows random access there is no performance change with for-each loop/for loop/iterator.

Which is faster for loop or forEach in Java?

The FOR loop without length caching and FOREACH work slightly faster on arrays than FOR with length caching. Array. Foreach performance is approximately 6 times slower than FOR / FOREACH performance. The FOR loop without length caching works 3 times slower on lists, comparing to arrays.


Video Answer


3 Answers

If isInRange() actually checks whether the given integer is in a particular range, perhaps it would be better to put records into a data structure that performs this operation in more efficient way.

For example, try to put records into TreeSet and then use subSet to find records in the range.

Another way is to build something like TreeMap<Integer, List<Matcher>> where value is a list of Matchers that cover a range between the current key and the following key. It can be even better, because number of Matchers is less than number of records.

like image 139
axtavt Avatar answered Oct 03 '22 07:10

axtavt


If you have large datasets and want speed and simplicity, consider using a text search engine like Lucene, which can index millions of documents and retrieve hits using quite complex matching parameters in a few milliseconds.

like image 31
Bohemian Avatar answered Oct 03 '22 05:10

Bohemian


One single loop doesn't take that advantage of multi-core... it would be better if you could break this loop iteration in subsets, creating threads.

For example: divide your array in 6 pieces, one thread for each piece.

like image 34
woliveirajr Avatar answered Oct 03 '22 05:10

woliveirajr