Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sort on a string that may contain a number

I need to write a Java Comparator class that compares Strings, however with one twist. If the two strings it is comparing are the same at the beginning and end of the string are the same, and the middle part that differs is an integer, then compare based on the numeric values of those integers. For example, I want the following strings to end up in order they're shown:

  • aaa
  • bbb 3 ccc
  • bbb 12 ccc
  • ccc 11
  • ddd
  • eee 3 ddd jpeg2000 eee
  • eee 12 ddd jpeg2000 eee

As you can see, there might be other integers in the string, so I can't just use regular expressions to break out any integer. I'm thinking of just walking the strings from the beginning until I find a bit that doesn't match, then walking in from the end until I find a bit that doesn't match, and then comparing the bit in the middle to the regular expression "[0-9]+", and if it compares, then doing a numeric comparison, otherwise doing a lexical comparison.

Is there a better way?

Update I don't think I can guarantee that the other numbers in the string, the ones that may match, don't have spaces around them, or that the ones that differ do have spaces.

like image 351
Paul Tomblin Avatar asked Sep 19 '08 19:09

Paul Tomblin


People also ask

How do you sort a string with a number inside?

To do this we can use the extra parameter that sort() uses. This is a function that is called to calculate the key from the entry in the list. We use regex to extract the number from the string and sort on both text and number.

How do you sort a number in an array of strings?

Using the Arrays.util package that provides sort() method to sort an array in ascending order. It uses Dual-Pivot Quicksort algorithm for sorting. Its complexity is O(n log(n)). It is a static method that parses an array as a parameter and does not return anything.


2 Answers

The Alphanum Algorithm

From the website

"People sort strings with numbers differently than software. Most sorting algorithms compare ASCII values, which produces an ordering that is inconsistent with human logic. Here's how to fix it."

Edit: Here's a link to the Java Comparator Implementation from that site.

like image 55
ScArcher2 Avatar answered Sep 27 '22 19:09

ScArcher2


Interesting little challenge, I enjoyed solving it.

Here is my take at the problem:

String[] strs = {   "eee 5 ddd jpeg2001 eee",   "eee 123 ddd jpeg2000 eee",   "ddd",   "aaa 5 yy 6",   "ccc 555",   "bbb 3 ccc",   "bbb 9 a",   "",   "eee 4 ddd jpeg2001 eee",   "ccc 11",   "bbb 12 ccc",   "aaa 5 yy 22",   "aaa",   "eee 3 ddd jpeg2000 eee",   "ccc 5", };  Pattern splitter = Pattern.compile("(\\d+|\\D+)");  public class InternalNumberComparator implements Comparator {   public int compare(Object o1, Object o2)   {     // I deliberately use the Java 1.4 syntax,      // all this can be improved with 1.5's generics     String s1 = (String)o1, s2 = (String)o2;     // We split each string as runs of number/non-number strings     ArrayList sa1 = split(s1);     ArrayList sa2 = split(s2);     // Nothing or different structure     if (sa1.size() == 0 || sa1.size() != sa2.size())     {       // Just compare the original strings       return s1.compareTo(s2);     }     int i = 0;     String si1 = "";     String si2 = "";     // Compare beginning of string     for (; i < sa1.size(); i++)     {       si1 = (String)sa1.get(i);       si2 = (String)sa2.get(i);       if (!si1.equals(si2))         break;  // Until we find a difference     }     // No difference found?     if (i == sa1.size())       return 0; // Same strings!      // Try to convert the different run of characters to number     int val1, val2;     try     {       val1 = Integer.parseInt(si1);       val2 = Integer.parseInt(si2);     }     catch (NumberFormatException e)     {       return s1.compareTo(s2);  // Strings differ on a non-number     }      // Compare remainder of string     for (i++; i < sa1.size(); i++)     {       si1 = (String)sa1.get(i);       si2 = (String)sa2.get(i);       if (!si1.equals(si2))       {         return s1.compareTo(s2);  // Strings differ       }     }      // Here, the strings differ only on a number     return val1 < val2 ? -1 : 1;   }    ArrayList split(String s)   {     ArrayList r = new ArrayList();     Matcher matcher = splitter.matcher(s);     while (matcher.find())     {       String m = matcher.group(1);       r.add(m);     }     return r;   } }  Arrays.sort(strs, new InternalNumberComparator()); 

This algorithm need much more testing, but it seems to behave rather nicely.

[EDIT] I added some more comments to be clearer. I see there are much more answers than when I started to code this... But I hope I provided a good starting base and/or some ideas.

like image 32
PhiLho Avatar answered Sep 27 '22 17:09

PhiLho