Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to compare strings (literal and numerical)

I have a performance problem related to string comparison (in Java).

I'm working on a project that needs to sort a huge list (a TableViewer in Eclipse). Anyway I've pinpointed the bottleneck down to the call to compareTo() for the string to be compared.

Is there any way to optimize the performance of a string compare? I've searched and googled to no avail...

As the project is strictly limited to a Win32 environment, I was thinking that maybe it would be possible to leverage on that...

Any suggestions would be greatly appreciated.

EDIT: I forgot to mention that I would need both numerical comparison and literal comparison of the strings.

EDIT2: The goal is essentially to speed up the user interface because it is unacceptable to wait a few seconds each time you click on the header of the table to perform a sort. I'm looking into maybe caching values somehow to speed up the comparison. As the strings are pretty much static I think it would be possible.

EDIT3: I know a lot of you have been disturbed by the try()-catch() thing. Actually that is less of a concern because even if I remove that code and only execute the catch-block (a single compareTo()) it still executes at virtually the same speed as the original code. If however I comment out the compareTo() also; leaving me with only the overhead of the compare function (getting labels, etc.) it is lightning fast. So I still need a better way to compare strings. Either by caching or by doing some other magic.

Unfortunately it is not possible to change the sorting algorithm - however I doubt that it is that slow because it succeeds in sorting pure integers quite fast.

CLARIFICATION:

The compare function is implemented as part of the TableViewer framework for performing sort operations which means that I'm not implementing the specific sorting algorithm but rather it is implemented by SWT/JFace. I'm only implementing the compare function.

What is further more interesting is the fact that the code for sorting doubles is faster than the string comparison. It is faster to sort columns with only numbers than with actual literal strings.... Which leads me to the conclusion that something fishy is going on in the compareTo() method...

Here is the core of the function:

// e1Label and e2Label is Strings to be compared
//

// Be smart about the comparison and use non-lexical comparison if
// possible (i.e. if both strings are actually numbers...)
//
// Warning: This is only "semi-smart" as the sorting might get "a bit"
// messed up if some of the values in a column can be parsed as
// doubles while others can not...
//
try {
    // Try using numeric (double) comparison of label values
    //
    double e1_double = Double.parseDouble(e1Label);
    double e2_double = Double.parseDouble(e2Label);
    rc = Double.compare(e1_double, e2_double);
} catch (NumberFormatException e) {
    // Use lexical comparison if double comparison is not possible
    //
    rc = e1Label.compareToIgnoreCase(e2Label);
}
like image 312
Anders Hansson Avatar asked Aug 27 '09 09:08

Anders Hansson


1 Answers

If you have knowledge about your String content you can pre-compute and store additional information to speed up the comparison. For example, suppose your Strings only contained capital letters A-Z. You could assign a rank to the String based on say the first 3 letters; e.g.

  • AAA := 1
  • AAB := 2
  • ...
  • ABA := 27

Then you could speed up your compareTo by first comparing each String's rank (fast int based comparison) and then only performing a full String compare if the ranks were equal.

like image 133
Adamski Avatar answered Sep 24 '22 10:09

Adamski