Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java - Sort Strings like Windows Explorer

I am trying to use code suggested by Sander Pham on another question. I need my java ArrayList of string names to be sorted like Windows Explorer does. His code worked for everything but for one issue. I would have liked to comment onto that question, but I need more reputation points to comment. Anyways... He suggested to use a custom comparator implemented class and use that to compare the string names. Here is the code of that class:

class IntuitiveStringComparator implements Comparator<String>
{
    private String str1, str2;
    private int pos1, pos2, len1, len2;

    public int compare(String s1, String s2)
    {
        str1 = s1;
        str2 = s2;
        len1 = str1.length();
        len2 = str2.length();
        pos1 = pos2 = 0;

        int result = 0;
        while (result == 0 && pos1 < len1 && pos2 < len2)
        {
            char ch1 = str1.charAt(pos1);
            char ch2 = str2.charAt(pos2);

            if (Character.isDigit(ch1))
            {
                result = Character.isDigit(ch2) ? compareNumbers() : -1;
            }
            else if (Character.isLetter(ch1))
            {
                result = Character.isLetter(ch2) ? compareOther(true) : 1;
            }
            else
            {
                result = Character.isDigit(ch2) ? 1
                : Character.isLetter(ch2) ? -1
                : compareOther(false);
            }

            pos1++;
            pos2++;
        }

        return result == 0 ? len1 - len2 : result;
    }

    private int compareNumbers()
    {
        // Find out where the digit sequence ends, save its length for
        // later use, then skip past any leading zeroes.
        int end1 = pos1 + 1;
        while (end1 < len1 && Character.isDigit(str1.charAt(end1)))
        {
            end1++;
        }
        int fullLen1 = end1 - pos1;
        while (pos1 < end1 && str1.charAt(pos1) == '0')
        {
            pos1++;
        }

        // Do the same for the second digit sequence.
        int end2 = pos2 + 1;
        while (end2 < len2 && Character.isDigit(str2.charAt(end2)))
        {
            end2++;
        }
        int fullLen2 = end2 - pos2;
        while (pos2 < end2 && str2.charAt(pos2) == '0')
        {
            pos2++;
        }

        // If the remaining subsequences have different lengths,
        // they can't be numerically equal.
        int delta = (end1 - pos1) - (end2 - pos2);
        if (delta != 0)
        {
            return delta;
        }

        // We're looking at two equal-length digit runs; a sequential
        // character comparison will yield correct results.
        while (pos1 < end1 && pos2 < end2)
        {
            delta = str1.charAt(pos1++) - str2.charAt(pos2++);
            if (delta != 0)
            {
                return delta;
            }
        }

        pos1--;
        pos2--;

        // They're numerically equal, but they may have different
        // numbers of leading zeroes. A final length check will tell.
        return fullLen2 - fullLen1;
    }

    private int compareOther(boolean isLetters)
    {
        char ch1 = str1.charAt(pos1);
        char ch2 = str2.charAt(pos2);

        if (ch1 == ch2)
        {
            return 0;
        }

        if (isLetters)
        {
            ch1 = Character.toUpperCase(ch1);
            ch2 = Character.toUpperCase(ch2);
            if (ch1 != ch2)
            {
                ch1 = Character.toLowerCase(ch1);
                ch2 = Character.toLowerCase(ch2);
            }
        }

        return ch1 - ch2;
    }   
}

In using this, it works great except for if the string name does not have a number after it. If it does not have a number, it is put at the end of the list, which is wrong. If it doesn't have a number, it should be at the beginning.

i.e.

filename.jpg
filename2.jpg
filename03.jpg
filename3.jpg

Currently it sorts that...

filename2.jpg
filename03.jpg
filename3.jpg
filename.jpg

What do I need to change in the code to correct this behavior?

Thanks

like image 338
CrackaB Avatar asked Apr 21 '14 20:04

CrackaB


People also ask

Can String be sorting in Java?

The string class doesn't have any method that directly sorts a string, but we can sort a string by applying other methods one after another. The string is a sequence of characters. In java, objects of String are immutable which means a constant and cannot be changed once created.

How do I sort by file type in Windows Explorer?

Sort Files and FoldersIn the desktop, click or tap the File Explorer button on the taskbar. Open the folder that contains the files you want to group. Click or tap the Sort by button on the View tab. Select a sort by option on the menu.


1 Answers

This is my second try to answer this. I used http://www.interact-sw.co.uk/iangblog/2007/12/13/natural-sorting as a start. Unfortunatly I think I found there problems as well. But I think in my code these problems are correctly adressed.

Info: Windows Explorer uses the API function StrCmpLogicalW() function to do its sorting. There it is called natural sort order.

So here is my unterstanding of the WindowsExplorerSort - Algorithm:

  • Filenames are compared part wise. As for now I identified the following parts: numbers, '.', spaces and the rest.
  • Each number within the filename is considered for a possible number compare.
  • Numbers are compared as numbers but if they are equal, the longer base string comes first. This happens with leading zeros.
    • filename00.txt, filename0.txt
  • If one compares a number part with a non number part, it will be compared as text.
  • Text will be compared case insensitive.

This list is based partly on try and error. I increased the number of test filenames, to adress more of the in comments mentioned pitfalls and the result was checked against a Windows Explorer.

So here is the output of this:

filename
filename 00
filename 0
filename 01
filename.jpg
filename.txt
filename00.jpg
filename00a.jpg
filename00a.txt
filename0
filename0.jpg
filename0a.txt
filename0b.jpg
filename0b1.jpg
filename0b02.jpg
filename0c.jpg
filename01.0hjh45-test.txt
filename01.0hjh46
filename01.1hjh45.txt
filename01.hjh45.txt
Filename01.jpg
Filename1.jpg
filename2.hjh45.txt
filename2.jpg
filename03.jpg
filename3.jpg

The new comparator WindowsExplorerComparator splits the filename in the already mentioned parts and does a part wise comparing of two filenames. To be correct, the new comparator uses Strings as its input so one has to create an adaptor Comparator like

new Comparator<File>() {
    private final Comparator<String> NATURAL_SORT = new WindowsExplorerComparator();

    @Override
    public int compare(File o1, File o2) {;
        return NATURAL_SORT.compare(o1.getName(), o2.getName());
    }
}

So here is the new Comparators source code and its test:

import java.io.File;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.Comparator;
import java.util.Iterator;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class WindowsSorter {

    public static void main(String args[]) {
        //huge test data set ;)
        List<File> filenames = Arrays.asList(new File[]{new File("Filename01.jpg"),
            new File("filename"), new File("filename0"), new File("filename 0"),
            new File("Filename1.jpg"), new File("filename.jpg"), new File("filename2.jpg"), 
            new File("filename03.jpg"), new File("filename3.jpg"), new File("filename00.jpg"),
            new File("filename0.jpg"), new File("filename0b.jpg"), new File("filename0b1.jpg"),
            new File("filename0b02.jpg"), new File("filename0c.jpg"), new File("filename00a.jpg"),
            new File("filename.txt"), new File("filename00a.txt"), new File("filename0a.txt"),
            new File("filename01.0hjh45-test.txt"), new File("filename01.0hjh46"),
            new File("filename2.hjh45.txt"), new File("filename01.1hjh45.txt"),
            new File("filename01.hjh45.txt"), new File("filename 01"),
            new File("filename 00")});

        //adaptor for comparing files
        Collections.sort(filenames, new Comparator<File>() {
            private final Comparator<String> NATURAL_SORT = new WindowsExplorerComparator();

            @Override
            public int compare(File o1, File o2) {;
                return NATURAL_SORT.compare(o1.getName(), o2.getName());
            }
        });

        for (File f : filenames) {
            System.out.println(f);
        }
    }

    public static class WindowsExplorerComparator implements Comparator<String> {

        private static final Pattern splitPattern = Pattern.compile("\\d+|\\.|\\s");

        @Override
        public int compare(String str1, String str2) {
            Iterator<String> i1 = splitStringPreserveDelimiter(str1).iterator();
            Iterator<String> i2 = splitStringPreserveDelimiter(str2).iterator();
            while (true) {
                //Til here all is equal.
                if (!i1.hasNext() && !i2.hasNext()) {
                    return 0;
                }
                //first has no more parts -> comes first
                if (!i1.hasNext() && i2.hasNext()) {
                    return -1;
                }
                //first has more parts than i2 -> comes after
                if (i1.hasNext() && !i2.hasNext()) {
                    return 1;
                }

                String data1 = i1.next();
                String data2 = i2.next();
                int result;
                try {
                    //If both datas are numbers, then compare numbers
                    result = Long.compare(Long.valueOf(data1), Long.valueOf(data2));
                    //If numbers are equal than longer comes first
                    if (result == 0) {
                        result = -Integer.compare(data1.length(), data2.length());
                    }
                } catch (NumberFormatException ex) {
                    //compare text case insensitive
                    result = data1.compareToIgnoreCase(data2);
                }

                if (result != 0) {
                    return result;
                }
            }
        }

        private List<String> splitStringPreserveDelimiter(String str) {
            Matcher matcher = splitPattern.matcher(str);
            List<String> list = new ArrayList<String>();
            int pos = 0;
            while (matcher.find()) {
                list.add(str.substring(pos, matcher.start()));
                list.add(matcher.group());
                pos = matcher.end();
            }
            list.add(str.substring(pos));
            return list;
        }
    }
}
like image 134
wumpz Avatar answered Nov 15 '22 23:11

wumpz