Given a fixed list of strings at compile-time e.g.:
"zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine"
Utilizing HashSet
we have a very fast way (O(1)) to tell if a String
provided at runtime is in the list of strings.
For example:
Set<String> SET = new HashSet<>(Arrays.asList( "zero", "one", "two", "three",
"four", "five", "six", "seven", "eight", "nine"));
boolean listed = SET.contains("some-text");
Are there any other faster ways to tell if a String
provided at runtime is in the initial list of strings (given at compile-time) or is HashSet
the fastest solution?
Given the interface below:
interface Checker {
String[] VALUES = { "zero", "one", "two", "three", "four", "five", "six",
"seven", "eight", "nine" };
boolean contains(String s);
}
Provide the fastest possible implementation given that the values listed in Checker.VALUES
will not be changed (so for example you can use those literals in your code if you want to).
HashSetChecker
ImplementationThe implementation which uses a HashSet
would look like this:
class HashSetChecker implements Checker {
private final Set<String> set = new HashSet<>(Arrays.asList(VALUES));
@Override
public boolean contains(String s) {
return set.contains(s);
}
}
When testing, we want to test the Checker.contains()
method with the initial interned strings, also with other interned strings (String
literals) which will not be found, and also Strings that have the same values (are equals) but are not interned Strings. For this purpose the following CheckerTester.TESTS
array will be used.
public class CheckerTester {
private static final String[] TESTS = { "zero", "one", "two", "three",
"four", "five", "six", "seven", "eight", "nine", "ten", "eleven",
"twelve", "thirteen", "fourteen", "fifteen", "sixteen",
"seventeen", "eighteen", "nineteen",
new String("zero"), new String("one"), new String("two"),
new String("three"), new String("four"), new String("five"),
new String("six"), new String("seven"), new String("eight"),
new String("nine"), new String("ten"), new String("eleven"),
new String("twelve"), new String("thirteen"),
new String("fourteen"), new String("fifteen"),
new String("sixteen"), new String("seventeen"),
new String("eighteen"), new String("nineteen") };
public static void test(Checker checker) {
final int N = 1_000_000;
long start = System.nanoTime();
for (int i = 0; i < N; i++)
for (String test : TESTS)
checker.contains(test);
long end = System.nanoTime();
System.out.printf("%s: %d ms\n", checker.getClass().getName(),
(end - start) / 1_000_000);
}
}
Let's see some implementations:
HashSet
with big capacitySome might say that multiple Strings might end up in the same HashSet
bucket, so let's use a big initial capacity:
class HashSetChecker2 implements Checker {
private final Set<String> set = new HashSet<>(1000);
{ set.addAll(Arrays.asList(VALUES)); }
@Override
public boolean contains(String s) {
return set.contains(s);
}
}
HashMap
instead of HashSet
HashSet
uses HashMap
in its implementation, let's just do the same: get rid of the "shell" HashSet
:
class HashMapChecker implements Checker {
private final Map<String, Object> map = new HashMap<>(1000);
{
for (String s : VALUES)
map.put(s, s);
}
@Override
public boolean contains(String s) {
return map.containsKey(s);
}
}
TreeSet
Some might say give TreeSet
a try too (it's sorted so it might have a chance). I know it's O(log(n)), but n
is small (10 in this case):
class TreeSetChecker implements Checker {
private final Set<String> set = new TreeSet<>(Arrays.asList(VALUES));
@Override
public boolean contains(String s) {
return set.contains(s);
}
}
class OrChecker implements Checker {
@Override
public boolean contains(String s) {
return "zero".equals(s) || "one".equals(s) || "two".equals(s)
|| "three".equals(s) || "four".equals(s) || "five".equals(s)
|| "six".equals(s) || "seven".equals(s) || "eight".equals(s)
|| "nine".equals(s);
}
}
Some might say that we should first check if by reference we have the String
and if not then revert to short-circuit OR checks:
class RefOrChecker extends OrChecker {
@Override
public boolean contains(String s) {
return "zero" == s || "one" == s || "two" == s || "three" == s
|| "four" == s || "five" == s || "six" == s || "seven" == s
|| "eight" == s || "nine" == s || super.contains(s);
}
}
switch
: fastest I've found so farSince we have a fixed list of String
s known at compile-time, we can take advantage of the possibility that we can use String
s in switch
statements.
We can add a case
for each String
from the fixed list and return true
, and add a default
case to return false
.
class SwitchChecker implements Checker {
@Override
public boolean contains(String s) {
switch (s) {
case "zero":
case "one":
case "two":
case "three":
case "four":
case "five":
case "six":
case "seven":
case "eight":
case "nine":
return true;
default:
return false;
}
}
}
switch
blocks)Maaartinus's answer about perfect hashing made me thinking. Even if we have a perfect hash, it still has to run on the whole content of the String
provided at runtime which we want to check. So instead we should use something that is available right in the String
: its length. Based on the length of the String
we use a switch
, and inside that switch
we use an internal switch
only listing the strings with the specified length. With this, we reduce the number of case
statements inside a switch
:
class EmbeddedSwitchChecker implements Checker {
@Override
public boolean contains(String s) {
switch (s.length()) {
case 3:
switch (s) {
case "one":
case "two":
case "six":
return true;
default:
return false;
}
case 4:
switch (s) {
case "zero":
case "four":
case "five":
case "nine":
return true;
default:
return false;
}
case 5:
switch (s) {
case "three":
case "seven":
case "eight":
return true;
default:
return false;
}
default:
return false;
}
}
}
This is basically the combination of an improved EmbeddedSwitchChecker
and OldCurmudgeon's idea about a state machine: here we use the switch
on the first character of the String
(but first we check its length), and based on that we either narrowed down to one possible String
or if not, we check the 2nd character too in which case the possible String
can be only one (and we can decide by calling String.equals()
):
class CharSwitchChecker implements Checker {
@Override
public boolean contains(String s) {
final int length = s.length();
if (length < 3 || length > 5)
return false;
switch (s.charAt(0)) {
case 'z':
return "zero".equals(s);
case 'o':
return "one".equals(s);
case 't':
return s.charAt(1) == 'w' ? "two".equals(s) : "three".equals(s);
case 'f':
return s.charAt(1) == 'o' ? "four".equals(s) : "five".equals(s);
case 's':
return s.charAt(1) == 'i' ? "six".equals(s) : "seven".equals(s);
case 'e':
return "eight".equals(s);
case 'n':
return "nine".equals(s);
}
return false;
}
}
Here are the test results:
TIME HOW FAST (compared to HashSetChecker)
-----------------------------------------------------------------------------
HashSetChecker: 929 ms 1.00x
HashSetChecker2: 892 ms 1.04x
HashMapChecker: 873 ms 1.06x
TreeSetChecker: 2265 ms 0.41x
OrChecker: 1815 ms 0.51x
RefOrChecker: 1708 ms 0.54x
SwitchChecker: 538 ms 1.73x
EmbeddedSwitchChecker: 467 ms 1.99x
CharSwitchChecker: 436 ms 2.13x
The SwitchChecker
solution is about 1.7 times faster, the EmbeddedSwitchChecker
is 2 times faster and the champion CharSwitchChecker
is about 2.13 times faster than the HashSetChecker
implementation. As expected the HashSet
with big initial capacity and the HashMap
solutions are slightly faster, and all other solutions fall behind.
The Complete Runnalbe Test Program plus All listed solutions are here in one box for those who want to try it or experiment with new implementations.
Edit: Following Luiggi Mendoza's suggestion for rules for performing a micro benchmark, I changed the main()
method for testing. I execute the whole test twice, and I only analyze the 2nd result. Also since the tests don't create new objects in the loop, I see no reason to call System.gc()
whatsoever.
import java.util.Arrays;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Set;
import java.util.TreeSet;
interface Checker {
String[] VALUES = { "zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine" };
boolean contains(String s);
}
class HashSetChecker implements Checker {
private final Set<String> set = new HashSet<>(Arrays.asList(VALUES));
@Override
public boolean contains(String s) {
return set.contains(s);
}
}
class HashSetChecker2 implements Checker {
private final Set<String> set = new HashSet<>(1000);
{
set.addAll(Arrays.asList(VALUES));
}
@Override
public boolean contains(String s) {
return set.contains(s);
}
}
class HashMapChecker implements Checker {
private final Map<String, Object> map = new HashMap<>(1000);
{
for (String s : VALUES)
map.put(s, s);
}
@Override
public boolean contains(String s) {
return map.containsKey(s);
}
}
class TreeSetChecker implements Checker {
private final Set<String> set = new TreeSet<>(Arrays.asList(VALUES));
@Override
public boolean contains(String s) {
return set.contains(s);
}
}
class OrChecker implements Checker {
@Override
public boolean contains(String s) {
return "zero".equals(s) || "one".equals(s) || "two".equals(s) || "three".equals(s)
|| "four".equals(s) || "five".equals(s) || "six".equals(s) || "seven".equals(s)
|| "eight".equals(s) || "nine".equals(s);
}
}
class RefOrChecker extends OrChecker {
@Override
public boolean contains(String s) {
return "zero" == s || "one" == s || "two" == s || "three" == s || "four" == s || "five" == s
|| "six" == s || "seven" == s || "eight" == s || "nine" == s || super.contains(s);
}
}
class SwitchChecker implements Checker {
@Override
public boolean contains(String s) {
switch (s) {
case "zero":
case "one":
case "two":
case "three":
case "four":
case "five":
case "six":
case "seven":
case "eight":
case "nine":
return true;
default:
return false;
}
}
}
class EmbeddedSwitchChecker implements Checker {
@Override
public boolean contains(String s) {
switch (s.length()) {
case 3:
switch (s) {
case "one":
case "two":
case "six":
return true;
default:
return false;
}
case 4:
switch (s) {
case "zero":
case "four":
case "five":
case "nine":
return true;
default:
return false;
}
case 5:
switch (s) {
case "three":
case "seven":
case "eight":
return true;
default:
return false;
}
default:
return false;
}
}
}
class CharSwitchChecker implements Checker {
@Override
public boolean contains(String s) {
final int length = s.length();
if (length < 3 || length > 5)
return false;
switch (s.charAt(0)) {
case 'z':
return "zero".equals(s);
case 'o':
return "one".equals(s);
case 't':
return s.charAt(1) == 'w' ? "two".equals(s) : "three".equals(s);
case 'f':
return s.charAt(1) == 'o' ? "four".equals(s) : "five".equals(s);
case 's':
return s.charAt(1) == 'i' ? "six".equals(s) : "seven".equals(s);
case 'e':
return "eight".equals(s);
case 'n':
return "nine".equals(s);
}
return false;
}
}
public class CheckerTester {
private static final String[] TESTS = { "zero", "one", "two", "three", "four", "five", "six", "seven",
"eight", "nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen", "sixteen",
"seventeen", "eighteen", "nineteen",
new String("zero"), new String("one"), new String("two"), new String("three"),
new String("four"), new String("five"), new String("six"), new String("seven"),
new String("eight"), new String("nine"), new String("ten"), new String("eleven"),
new String("twelve"), new String("thirteen"), new String("fourteen"), new String("fifteen"),
new String("sixteen"), new String("seventeen"), new String("eighteen"), new String("nineteen") };
public static void test(Checker checker) {
final int N = 1_000_000;
long start = System.nanoTime();
for (int i = 0; i < N; i++)
for (String test : TESTS)
checker.contains(test);
long end = System.nanoTime();
System.out.printf("%s: %d ms\n", checker.getClass().getName(), (end - start) / 1_000_000);
}
public static void main(String args[]) {
for (int i = 1; i <= 2; i++) {
System.out.println("---- Check #" + i);
test(new HashSetChecker());
test(new HashSetChecker2());
test(new HashMapChecker());
test(new TreeSetChecker());
test(new OrChecker());
test(new RefOrChecker());
test(new SwitchChecker());
test(new EmbeddedSwitchChecker());
test(new CharSwitchChecker());
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With