Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the best way to count and sort a string array

I am trying to find if there is a good way to search (count number of occurrences) and then sort a String array in a efficient way... that is a way that will work well in embedded systems (32Mb)

Example: I have to count the number of time the character A, B, C, etc... is used save that result for posterior sorting...

I can count using a public int count(String searchDomain, char searchValue) method, but each string should have all alphabet letter for instance:

"This is a test string"
A:1,B:0,C:0,D:0,E:1,I:3,F:0,...
"ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCC"
A:7,B:0,C:22,G:18

My sorting method need to be able to answer to things like: Sort by number of As, Bs sort first by As and then sort that subdomain by Bs

This is not for homework, it's for an application that needs to run on mobile phones, i need this to be efficient, my current implementation is too slow and uses too much memory.

like image 822
Astronaut Avatar asked Feb 13 '12 17:02

Astronaut


1 Answers

I'd take advantage of Java's (very efficient) built in sorting capabilities. To start with, define a simple class to contain your string and its metadata:

class Item
{
    // Your string. It's public, so you can get it if you want,
    // but also final, so you can't accidentally change it.
    public final String string;

    // An array of counts, where the offset is the alphabetical position
    // of the letter it's counting. (A = 0, B = 1, C=2...)
    private final short[] instanceCounts = new short[32];

    public Item(String string)
    {
        this.string = string;
        for(char c : string.toCharArray())
        {
            // Increment the count for this character
            instanceCounts[(byte)c - 65] ++;
        }
    }

    public int getCount(char c)
    {
        return instanceCounts[(byte)c - 65];
    }
}

This will hold your String (for searching and display), and set up an array of shorts with the count of the matching characters. (If you're really low on memory and you know your strings have more than 255 of any one character, you can even change this to an array of bytes.) A short is only 16 bytes, so the array itself will only take 64 bytes all together regardless of how complex your string. If you'd rather pay the performance hit for calculating the counts every time, you can get rid of the array and replace the getCount() method, but you'll probably end up saving once-off memory by consuming frequently-garbage-collected memory, which is a big performance hit. :)

Now, define the rule you want to search on using a Comparator. For example, to sort by the number of A's in your string:

class CompareByNumberOfA implements Comparator<Item>
{
    public int compare(Item arg0, Item arg1) 
    {
        return arg1.getCount('A') - arg0.getCount('A');
    }
}

Finally, stick all of your items in an array, and use the built in (and highly memory efficient) Arrays methods to sort. For example:

public static void main(String args[])
{
    Item[] items = new Item[5];
    items[0]= new Item("ABC");
    items[1]= new Item("ABCAA");
    items[2]= new Item("ABCAAC");
    items[3]= new Item("ABCAAA");
    items[4]= new Item("ABBABZ");

    // THIS IS THE IMPORTANT PART!
    Arrays.sort(items, new CompareByNumberOfA());

    System.out.println(items[0].string);
    System.out.println(items[1].string);
    System.out.println(items[2].string);
    System.out.println(items[3].string);
    System.out.println(items[4].string);
}

You can define a whole bunch of comparators, and use them how you like.

One of the things to remember about coding with Java is not to get too clever. Compilers do a damn fine job of optimizing for their platform, as long as you take advantage of things they can optimize (like built-in APIs including Arrays.sort).

Often, if you try to get too clever, you'll just optimize yourself right out of an efficient solution. :)

like image 108
Erica Avatar answered Nov 15 '22 14:11

Erica