Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

can the compiler feasibly calculate a DFA from a regular expression?

In modding a closed-source game I'm modifying the machine code at runtime to jmp into my own code. To do this in a generic manner I'm using pattern matching to find the code locations I want to modify. (The patterns consist only of characters/bytes and wildcards where bytes can vary.) By building a deterministic finite automaton from all my patterns I can search in linear time.

However I've found that building the DFA takes more time than actually running it, especially in debug builds (which I certainly want during development), and that's only going to get worse as I add more patterns. But that could easily be done offline. I'm currently thinking about the how; can the compiler do it?

As far as I know it's impossible with constexpr functions since I can't allocate static memory with them. But I have a vague feeling that it should be doable in a type-safe manner with template meta-programming. Or am I likely to run into recursion limits when creating automatons with hundreds or thousands of states?

And regardless of technical possibility, is it reasonable? Or should I rather, say, calculate a source file in a separate build step?

like image 729
Mr. Wonko Avatar asked Dec 22 '14 22:12

Mr. Wonko


People also ask

How to construct DFA from regular expression?

Utility – To construct DFA from a given regular expression, we can first construct an NFA for the given expression and then convert this NFA to DFA by a subset construction method.

Is a regular expression equivalent to a DFA?

While the two appear to be very different, they turn out to be equivalent in expressive power: every DFA has an equivalent regular expression, and vice versa.

What is a DFA regex?

A DFA, also known as a finite state machine, is a finite graph in which the vertices (nodes) are the states of the automaton. The edges of the graph are labeled with characters, and there is a distinguished start state and some number of accept states.

Is an alternative to a DFA Although it is a fully legitimate FA Its ambiguity makes it more difficult to work with?

The alternative to a DFA is a nondeterministic finite automaton (NFA). An NFA is a perfectly valid FA, but it has an ambiguity that makes it some- what more difficult to work with.


1 Answers

Yes, this is possible. The construction can be done with one of the standard algorithms such as Thompson's construction algorithm to get an NFA and then building an DFA from that. The problem is that when converting a NFA to a DFA an exponential blowup in the number of states is possible.

How to deal with the required recursion depth is discussed in the answers to this question.

It is possible to implement the algorithm using template metaprogramming. A list of basic building blocks can be found here, which allows you to store values, implement branches and functions.

Here is an example for a function from the linked page:

template<int X, int Y>
struct Adder
{
   enum { result = X + Y };
};

This is a function that adds its two parameters and stores the result in the result enum member. You can call this at compile time with something like Adder<1, 2>::result, which will be expanded at compile time and act exactly like a literal 3 in your program.

Since Thompson's algorithm relies on recursion, here an example for evaluating a recursion:

template <unsigned n>
struct factorial
{
  enum { value = n * factorial<n-1>::value };
};

This implements a factorial at compile time. This could then be used during runtime like this factorial<5>::value.

like image 193
Beginner Avatar answered Sep 17 '22 15:09

Beginner