int[] records = job.getTargetSearchIDs();
topology.applyMatcherSearchWeight(records);
int[] mIDs = topology.getMatcherIds();
SystemResponse[] sysResponse = new SystemResponse[mIDs.length];
Map<Integer, SearchCommand> mrCmdsMap = new HashMap<Integer, SearchCommand>();
The length of mIDs is 250 and the length of records is 7.5 million integers. I want this loop to run in less than 3 seconds on a server with an 8-core Intel Xeon X5355 processor, 64-bit Linux (Ubuntu) and 32-bit Java.
for (long mID : mIDs) {
List<Integer> recIDsToMatch = new LinkedList<Integer>();
Matcher matcher = topology.getMatcherById(mID);
for (long record : records) {
if (matcher.getRange().isInRange(record))
recIDsToMatch.add(record);
}
if (recIDsToMatch.size() > 0) {
SearchCommand command = new SearchCommand(job.getMatchParameters(),
job.getRequestType(),
job.getId(),
job.getMatchParameters().getEngineProperties(),
recIDsToMatch);
command.setTimeout(searchTimeout, TimeUnit.SECONDS);
mrCmdsMap.put(mID, command);
}
}
What improvements come to mind when you read this code snippet? What data structure and/or algorithm improvements could be made?
forEach() Since Java 8, we can use the forEach() method to iterate over the elements of a list. This method is defined in the Iterable interface, and can accept Lambda expressions as a parameter.
Iterator and for-each loop are faster than simple for loop for collections with no random access, while in collections which allows random access there is no performance change with for-each loop/for loop/iterator.
The FOR loop without length caching and FOREACH work slightly faster on arrays than FOR with length caching. Array. Foreach performance is approximately 6 times slower than FOR / FOREACH performance. The FOR loop without length caching works 3 times slower on lists, comparing to arrays.
If isInRange()
actually checks whether the given integer is in a particular range, perhaps it would be better to put records into a data structure that performs this operation in more efficient way.
For example, try to put records into TreeSet
and then use subSet
to find records in the range.
Another way is to build something like TreeMap<Integer, List<Matcher>>
where value is a list of Matcher
s that cover a range between the current key and the following key. It can be even better, because number of Matcher
s is less than number of records.
If you have large datasets and want speed and simplicity, consider using a text search engine like Lucene, which can index millions of documents and retrieve hits using quite complex matching parameters in a few milliseconds.
One single loop doesn't take that advantage of multi-core... it would be better if you could break this loop iteration in subsets, creating threads.
For example: divide your array in 6 pieces, one thread for each piece.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With