Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently get subset of strings "startingWith" out of a set

I have a big set of strings and I want to create an autosuggest feature for it.

Assume the set is ["foo", "fighter"]

Typing "f" should return both values, and typing "fo" should only return "foo".

Currently I am just iterating through the set and filering out results by calling startsWith, however it's too slow.

The standard TreeSet with its subset functions will not help much here as it only implements a RB tree.

Is there an efficient solution in the Java API or do I have to build my own Set implementation?


Edit: My implementation looks like this, using Andrey Naumenkos trie datastructures. Notice to increase the array size if you want to use extended ASCII characters. If you use a List instead of the Map you get the results in a sorted order.

public Set<String> getSubset(String s) {
    result = new HashSet<String>();
    getSubset(root, s);
    return result;
}

private void getSubset(TrieNode node, String s) {
    TrieNode n = node;
    for (char ch : s.toCharArray()) {
        if (n.children[ch] != null) {
            n = n.children[ch];
            continue;
        }
        return;
    }
    getSubsetR(n, s);
}

private void getSubsetR(TrieNode node, String s) {
    for (char ch = 0; ch < node.children.length; ch++) {
        TrieNode child = node.children[ch];
        if (child != null)
            getSubsetR(child, s + ch);
    }
    if (node.leaf) {
        result.add(s);
    }
}
like image 854
Konstantin W Avatar asked Apr 16 '15 13:04

Konstantin W


1 Answers

What you are looking for is a prefix tree data structure: http://en.wikipedia.org/wiki/Trie

The code here will get you started: https://sites.google.com/site/indy256/algo/trie

like image 161
Neithrik Avatar answered Oct 20 '22 22:10

Neithrik