Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Match binary pattern including "don't cares"

I have a set of bit patterns, and want to find the index of the element in the set which matches a given input. The bit pattern contains "don't care" bits, that is x-es which matches both 0 and 1.

Example The set of bit patterns are

index abcd
   0  00x1
   1  01xx
   2  100x
   3  1010
   4  1x11

Then, trying to match 0110 should return index 1 and 1011 should return index 4.

How can this problem be solved faster than a linear search through the elements? I guess a kind of binary tree could be made, but then, what is a smart way of creating such a tree? Are there other efficient data structures/algorithms for such a problem, primarily in terms of query efficiency both also storage requirements.

  • The bit patterns will be 64 bits (or more)
  • The number of elements in the set will be in the order 10^5 - 10^7
  • Not all bit combinations are represented in the set, e.g in the example 0000 is not represented
  • There will be a high number of x-es in the data set
  • A bit string will match only one of the elements in the set

I have two different cases in which I need to solve this problem

  • Case 1: I have the possibility of doing a lot of precomputing
  • Case 2: New elements will be added to the set on the fly

Update The x-es are more likely to show up in some bit positions than others, that is, some bit positions will be dominated by x-es while others will be mainly zeroes/ones.

like image 950
Petter T Avatar asked Feb 10 '14 09:02

Petter T


1 Answers

I think you can build a trie tree for the bit patterns, the node contains the original index of the pattern.

To complete the match is just to search in a trie tree, when the trie node contains the same bit of 'x', go to the next node. The result may contain multiple indexes for a certain input.

Here is my solution,

public class Solution {

    public static class Trie<T> {
        private final Character WILD = 'x';
        private Map<Character, Trie> children;
        private boolean isNode;
        private T value;

        public Trie() {
            children = new HashMap<Character, Trie>();
            isNode = false;
            value = null;
        }

        public void insert(String key, T value) {
            Trie<T> current = this;
            for (int i = 0; i < key.length(); i++) {
                char c = key.charAt(i);
                if (current.children.containsKey(c)) {
                    current = current.children.get(c);
                } else {
                    Trie<T> next = new Trie();
                    current.children.put(c, next);
                    current = next;
                }
            }
            current.isNode = true;
            current.value = value;
        }

        public List<T> get(String key) {
            List<T> result = new ArrayList<T>();
            get(this, key.toCharArray(), 0, result);
            return result;
        }

        private void get(Trie<T> trie, char[] chars, int index, List<T> result) {
            if (index == chars.length) {
                if (trie != null && trie.isNode) {
                    result.add(trie.value);
                }
                return;
            }
            char c = chars[index];
            if (trie.children.containsKey(c)) {
                get(trie.children.get(c), chars, index + 1, result);
            }
            if (trie.children.containsKey(WILD)) {
                get(trie.children.get(WILD), chars, index + 1, result);
            }
        }
    }

    public static void main(String[] args) {
        Trie<Integer> trie = new Trie<Integer>();
        trie.insert("00x1", 0);
        trie.insert("01xx", 1);
        trie.insert("100x", 2);
        trie.insert("1010", 3);
        trie.insert("1x11", 4);
        System.out.println(trie.get("0110")); // [1]
        System.out.println(trie.get("1011")); // [4]
    }
}
like image 183
Qiang Jin Avatar answered Sep 19 '22 20:09

Qiang Jin