Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data Structure for representing patterns in strings

I'm looking for a good data structure to represent strings of the form:

Domain:Key1=Value1,Key2=Value2...
  • Each "Domain" can contain the following pattern characters - *, ? (* - 0 or more characters, ? - 0 or 1 character)

  • Each "Key" can contain the following pattern characters - *, ? (* - 0 or more characters, ? - 0 or 1 character)

  • Each "Value" can contain the following pattern characters - *, ? (* - 0 or more characters, ? - 0 or 1 character)

Examples:

JBoss:*
*:*
JBoss:type=ThreadPool,*
JBoss:type=Thread*,*
JB*:name=http1,type=ConnectionPool

If you are familiar with JMX ObjectName's then essentially this is the ObjectName pattern.

I'm looking for ways to easily store a rule corresponding to each pattern and be able to quickly delete,update and add new rules.

I started out by using a Prefix Trie, but got stuck with the pattern characters *, ?.

like image 901
Raj Avatar asked Jun 16 '11 22:06

Raj


People also ask

Which data structure is used in string?

Overview. A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding.

What is string pattern matching in data structure?

Pattern Matching works by "reading" through text strings to match patterns that are defined using Pattern Matching Expressions, also known as Regular Expressions. Pattern Matching can be used in Identification as well as in Pre-Classification Processing, Page Processing, or Storage Processing.

Which algorithm is best for pattern matching?

The so-called naive or brute force algorithm is the most intuitive approach to the string pattern-matching problem. This algorithm attempts simply to match the pattern in the target at successive positions from left to right.

What is pattern string?

Pattern matching and strings. By far the most common form of pattern matching involves strings of characters. In many programming languages, a particular syntax of strings is used to represent regular expressions, which are patterns describing string characters.


1 Answers

I think the easiest way to do it would be to build an NFA like trie, one which allows transitions to multiple states. This, of course, adds the complexity of having another data structure which maps to multiple states given a set of characters to match. For instance, with your example:

JBoss:*
*:*
JBoss:type=ThreadPool,*
JBoss:type=Thread*,*
JB*:name=http1,type=ConnectionPool

Lets say you try to match JBoxx:name=*

When you match the substring JBo, you would need a data structure to hold the states JBo and JB* and * since you have three branches at this point. When the x comes in, you can discard the JBo route since its a dead end and use JB* and *. The easy implementation is to have a set of possible match states at any given time and try the next character on each of them. You would also need a way to resolve multiple matches (as in this case) - perhaps something as simple as a longest match?

It all seems to make sense when you think of the trie as an NFA instead of the well-accepted DFA format. Hope that helps.

like image 121
kyun Avatar answered Sep 29 '22 12:09

kyun