<pre class="prettyprint"><code>int[] records = job.getTargetSearchIDs(); topology.applyMatcherSearchWeight(records); int[] mIDs = topology.getMatcherIds(); SystemResponse[] sysResponse = new SystemResponse[mIDs.length]; Map<Integer, SearchCommand> mrCmdsMap = new HashMap<Integer, SearchCommand>(); </code></pre> The length of mIDs is 250 and the length of records is 7.5 million integers. I want this loop to run in less than 3 seconds on a server with an 8-core Intel Xeon X5355 processor, 64-bit Linux (Ubuntu) and 32-bit Java. <pre class="prettyprint"><code>for (long mID : mIDs) { List<Integer> recIDsToMatch = new LinkedList<Integer>(); Matcher matcher = topology.getMatcherById(mID); for (long record : records) { if (matcher.getRange().isInRange(record)) recIDsToMatch.add(record); } if (recIDsToMatch.size() > 0) { SearchCommand command = new SearchCommand(job.getMatchParameters(), job.getRequestType(), job.getId(), job.getMatchParameters().getEngineProperties(), recIDsToMatch); command.setTimeout(searchTimeout, TimeUnit.SECONDS); mrCmdsMap.put(mID, command); } } </code></pre> What improvements come to mind when you read this code snippet? What data structure and/or algorithm improvements could be made?

If <code>isInRange()</code> actually checks whether the given integer is in a particular range, perhaps it would be better to put records into a data structure that performs this operation in more efficient way. For example, try to put records into <code>TreeSet</code> and then use <code>subSet</code> to find records in the range. Another way is to build something like <code>TreeMap<Integer, List<Matcher>></code> where value is a list of <code>Matcher</code>s that cover a range between the current key and the following key. It can be even better, because number of <code>Matcher</code>s is less than number of records.

How can I iterate over a large Java List of Integers in less time?

Tags:

java

performance

algorithm

int[] records = job.getTargetSearchIDs();
topology.applyMatcherSearchWeight(records);
int[] mIDs = topology.getMatcherIds();
SystemResponse[] sysResponse = new SystemResponse[mIDs.length];
Map<Integer, SearchCommand> mrCmdsMap = new HashMap<Integer, SearchCommand>();

The length of mIDs is 250 and the length of records is 7.5 million integers. I want this loop to run in less than 3 seconds on a server with an 8-core Intel Xeon X5355 processor, 64-bit Linux (Ubuntu) and 32-bit Java.

for (long mID : mIDs) {
  List<Integer> recIDsToMatch = new LinkedList<Integer>();
  Matcher matcher = topology.getMatcherById(mID);

  for (long record : records) {
    if (matcher.getRange().isInRange(record))
      recIDsToMatch.add(record);
  }

  if (recIDsToMatch.size() > 0) {
    SearchCommand command = new SearchCommand(job.getMatchParameters(), 
      job.getRequestType(),
      job.getId(),
      job.getMatchParameters().getEngineProperties(),
      recIDsToMatch);

    command.setTimeout(searchTimeout, TimeUnit.SECONDS);
    mrCmdsMap.put(mID, command);
  }
}

What improvements come to mind when you read this code snippet? What data structure and/or algorithm improvements could be made?

339

asked Jul 25 '11 11:07

user861271

Video Answer

3 Answers

If isInRange() actually checks whether the given integer is in a particular range, perhaps it would be better to put records into a data structure that performs this operation in more efficient way.

For example, try to put records into TreeSet and then use subSet to find records in the range.

Another way is to build something like TreeMap<Integer, List<Matcher>> where value is a list of Matchers that cover a range between the current key and the following key. It can be even better, because number of Matchers is less than number of records.

139

answered Oct 03 '22 07:10

axtavt

If you have large datasets and want speed and simplicity, consider using a text search engine like Lucene, which can index millions of documents and retrieve hits using quite complex matching parameters in a few milliseconds.

answered Oct 03 '22 05:10

Bohemian

One single loop doesn't take that advantage of multi-core... it would be better if you could break this loop iteration in subsets, creating threads.

For example: divide your array in 6 pieces, one thread for each piece.

answered Oct 03 '22 05:10

woliveirajr

Related questions
                            
                                Password generator in Java [duplicate]
                            
                                Using GWT's NumberFormat class in the shared package
                            
                                performance is slow with hibernate and MS sql server
                            
                                Differences between Jakarta Regexp and Java 6 java.util.regex
                            
                                Java:Immutability and serialization
                            
                                Java ClassLoader - Adding dynamically loaded jars to the system class loader
                            
                                maven-nar-plugin vs native-maven-plugin, which is better?
                            
                                How to know if a tag contains a value or another tag?
                            
                                TransactionAttribute annotation (@REQUIRES_NEW) ignored
                            
                                java - Atomic access to field within object
                            
                                SVNKIT doExport method - what is pegRevision?
                            
                                Java Swing: List Models and Collections
                            
                                Convert GMT DateTime String
                            
                                Best Place to Put a Comparator
                            
                                FTP Upload and Download on Android
                            
                                Maven: Compiling package-info.java to package-info.class?
                            
                                java.net.SocketException: Software caused connection abort: recv failed, with java.net.SocketException: Connection reset [duplicate]
                            
                                doGet called twice jetty server
                            
                                Trying to set the last modified time of a file in Java after renaming it
                            
                                Java interfaces, the creator patterns and the "getInstance()" method or equivalent

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With