Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to check if a List<String> contains a unique String

Basically I have about 1,000,000 strings, for each request I have to check if a String belongs to the list or not.

I'm worried about the performance, so what's the best method? ArrayList? Hash?

like image 749
Ben Avatar asked Jul 22 '10 09:07

Ben


People also ask

How do you check if a string contains a string from a list?

Using any() to check if string contains element from list. Using any function is the most classical way in which you can perform this task and also efficiently. This function checks for match in string with match of each element of list.

How do you check if any item in a list is in a string?

Use any() to check if a string contains an element from a list. Call any(iterable) with iterable as a generator expression that checks if any item from the list is in the string. Use the syntax element in string for element in list to build the generator expression.

How do you find a specific string in an ArrayList?

Get the array list. Using the for-each loop get each element of the ArrayList object. Verify whether each element in the array list contains the required string. If so print the elements.

How do you check if a list contains a particular string in Python?

The any() function is used to check the existence of an element in the list. it's like- if any element in the string matches the input element, print that the element is present in the list, else, print that the element is not present in the list. Example: Python3.


2 Answers

Your best bet is to use a HashSet and check if a string exists in the set via the contains() method. HashSets are built for fast access via the use of Object methods hashCode() and equals(). The Javadoc for HashSet states:

This class offers constant time performance for the basic operations (add, remove, contains and size),

HashSet stores objects in hash buckets which is to say that the value returned by the hashCode method will determine which bucket an object is stored in. This way, the amount of equality checks the HashSet has to perform via the equals() method is reduced to just the other Objects in the same hash bucket.

To use HashSets and HashMaps effectively, you must conform to the equals and hashCode contract outlined in the javadoc. In the case of java.lang.String these methods have already been implemented to do this.

like image 154
krock Avatar answered Sep 29 '22 11:09

krock


In general, a HashSet will give you better performance, since it does not have to look through each element and compare, like an ArrayList does, but typically compares at most a few elements, where the hashcodes are equal.

However, for 1M strings, the performance of hashSet may still not be optimal. A lot of cache misses will slow down searching the set. If all strings are equally likely, then this is unavoidable. However, if some strings are more often requested than others, then you can place the common strings into a small hashSet, and check that first, before checking the larger set. The small hashset should be sized to fit in cache (e.g. a few hundred K at most). Hits to the small hashset will then be very fast, while hits to the larger hashset proceed at speed limited by the memory bandwidth.

like image 37
mdma Avatar answered Sep 29 '22 11:09

mdma