Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate if two infinite regex solution sets don't intersect

In calculate if two arbitrary regular expressions have any overlapping solutions (assuming it's possible).

For example these two regular expressions can be shown to have no intersections by brute force because the two solution sets are calculable because it's finite.

^1(11){0,1000}$ ∩     ^(11){0,1000}$        = {}
{1,111, ..., ..111} ∩ {11,1111, ..., ...11} = {}
{}                                          = {}

But replacing the {0,1000} by * remove the possibility for a brute force solution, so a smarter algorithm must be created.

^1(11)*$ ∩ ^(11)*$ = {}
{1,^1(11)*$} ∩ {^(11)*$} = {}
{1,^1(11)*$} ∩ {11,^11(11)*$} = {}
{1,111,^111(11)*$} ∩ {11,^(11)*$} = {}
.....

In another similar question one answer was to calculate the intersection regex. Is that possible to do? If so how would one write an algorithm to do such a thing?

I think this problem might be domain of the halting problem.

EDIT:

I've used the accepted solution to create the DFAs for the example problem. It's fairly easy to see how you can use a BFS or DFS on the graph of states for M_3 to determine if a final state from M_3 is reachable.

DFA solution

like image 449
Kendall Hopkins Avatar asked Oct 11 '11 21:10

Kendall Hopkins


2 Answers

It is not in the domain of the halting problem; deciding whether the intersection of regular languages is empty or not can be solved as follows:

  1. Construct a DFA M1 for the first language.
  2. Construct a DFA M2 for the second language. Hint: Kleene's Theorem and Power Set machine construction
  3. Construct a DFA M3 for M1 intersect M2. Hint: Cartesian Product Machine construction
  4. Determine whether L(M3) is empty. Hint: If M3 has n states, and M3 doesn't accept any strings of length no greater than n, then L(M3) is empty... why?

Each of those things can be algorithmically done and/or checked. Also, naturally, once you have a DFA recognizing the intersection of your languages, you can construct a regex to match the language. And if you start out with a regex, you can make a DFA. This is definitely computable.

EDIT:

So to build a Cartesian Product Machine, you need two DFAs. Let M1 = (E, q0, Q1, A1, f1) and M2 = (E, q0', Q2, A2, f2). In both cases, E is the input alphabet, q0 is the start state, Q is the set of all states, A is the set of accepting states, and f is the transition function. Construct M3 where...

  1. E3 = E
  2. Q3 = Q1 x Q2 (ordered pairs)
  3. q0'' = (q0, q0')
  4. A3 = {(x, y) | x in A1 and y in A2}
  5. f3(s, (x, y)) = (f1(s, x), f2(s, y))

Provided I didn't make any mistakes, L(M3) = L(M1) intersect L(M2). Neat, huh?

like image 138
Patrick87 Avatar answered Nov 11 '22 20:11

Patrick87


I've created a PHP implementation of Patrick87 answer. In addition to implementing the Intersection via Cartesian Product Machine, I've also implemented an alterative algorithm for finding Intersections of DFAs using De Morgan.

Intersection( DFA_1, DFA_2 ) === ! UNION( ! DFA_1, ! DFA_2 )

* ! is defined as negation

This works very well for DFAs as the negation of a fully defined DFA (those with every possible transition state defined) is just to add all non-final states to the final state set and remove all current final states from the final state set (non-final -> final, final -> non->final). Union of DFA can be done easily by turning them into a NFA and then creating a new starting node that connects the unioned DFA's old start nodes by lambda transforms.

In addition to solving the intersection problem, the library I created is also able to determinize a NFA to a DFA and convert Regex to NFA.

EDIT:

I have created a webapp that allows this sort of transformations on regex languagues using what I learned form this question (and others).

like image 34
Kendall Hopkins Avatar answered Nov 11 '22 20:11

Kendall Hopkins