I want to test whether two languages have a string in common. Both of these languages are from a subset of regular languages described below and I only need to know whether there exists a string in both languages, not produce an example string. The language is specified by a glob-like string like <blockquote> <code>/foo/**/bar/*.baz</code> </blockquote> where <code>**</code> matches 0 or more characters, and <code>*</code> matches zero or more characters that are not <code>/</code>, and all other characters are literal. Any ideas? thanks, mike EDIT: I implemented something that seems to perform well, but have yet to try a correctness proof. You can see the source and unit tests

Build FAs <code>A</code> and <code>B</code> for both languages, and construct the "intersection FA" <code>AnB</code>. If <code>AnB</code> has at least one accepting state accessible from the start state, then there is a word that is in both languages. Constructing <code>AnB</code> could be tricky, but I'm sure there are FA textbooks that cover it. The approach I would take is: <ul> <li>The states of <code>AnB</code> is the cartesian product of the states of <code>A</code> and <code>B</code> respectively. A state in <code>AnB</code> is written <code>(a, b)</code> where <code>a</code> is a state in <code>A</code> and <code>b</code> is a state in <code>B</code>.</li> <li>A transition <code>(a, b) ->r (c, d)</code> (meaning, there is a transition from <code>(a, b)</code> to <code>(c, d)</code> on symbol <code>r</code>) exists iff <code>a ->r c</code> is a transition in <code>A</code>, and <code>b ->r d</code> is a transition in <code>B</code>.</li> <li> <code>(a, b)</code> is a start state in <code>AnB</code> iff <code>a</code> and <code>b</code> are start states in <code>A</code> and <code>B</code> respectively.</li> <li> <code>(a, b)</code> is an accepting state in <code>AnB</code> iff each is an accepting state in its respective FA.</li> </ul> This is all off the top of my head, and hence completely unproven!

I just did a quick search and this problem is decidable (aka can be done), but I don't know of any good algorithms to do it. One is solution is: <ol> <li>Convert both regular expressions to NFAs A and B</li> <li>Create a NFA, C, that represents the intersection of A and B.</li> <li>Now try every string from 0 to the number of states in C and see if C accepts it (since if the string is longer it must repeat states at one point). </li> </ol> I know this might be a little hard to follow but this is only way I know how.

Testing intersection of two regular languages

Tags:

parsing

finite-automata

automata

I want to test whether two languages have a string in common. Both of these languages are from a subset of regular languages described below and I only need to know whether there exists a string in both languages, not produce an example string.

The language is specified by a glob-like string like

/foo/**/bar/*.baz

where ** matches 0 or more characters, and * matches zero or more characters that are not /, and all other characters are literal.

Any ideas?

thanks, mike

EDIT:

I implemented something that seems to perform well, but have yet to try a correctness proof. You can see the source and unit tests

876

asked Feb 26 '10 00:02

Mike Samuel

2 Answers

Build FAs A and B for both languages, and construct the "intersection FA" AnB. If AnB has at least one accepting state accessible from the start state, then there is a word that is in both languages.

Constructing AnB could be tricky, but I'm sure there are FA textbooks that cover it. The approach I would take is:

The states of AnB is the cartesian product of the states of A and B respectively. A state in AnB is written (a, b) where a is a state in A and b is a state in B.
A transition (a, b) ->r (c, d) (meaning, there is a transition from (a, b) to (c, d) on symbol r) exists iff a ->r c is a transition in A, and b ->r d is a transition in B.
(a, b) is a start state in AnB iff a and b are start states in A and B respectively.
(a, b) is an accepting state in AnB iff each is an accepting state in its respective FA.

This is all off the top of my head, and hence completely unproven!

answered Oct 11 '22 22:10

Edmund

I just did a quick search and this problem is decidable (aka can be done), but I don't know of any good algorithms to do it. One is solution is:

Convert both regular expressions to NFAs A and B
Create a NFA, C, that represents the intersection of A and B.
Now try every string from 0 to the number of states in C and see if C accepts it (since if the string is longer it must repeat states at one point).

I know this might be a little hard to follow but this is only way I know how.

answered Oct 11 '22 20:10

Bishnu

Related questions
                            
                                Removing css information from HTML in java
                            
                                $.parseJSON() returns null on valid object
                            
                                Interpreting Strings as Other Data Types in Python
                            
                                How to select parent based on the child in lxml?
                            
                                converting a latex code to mathml or svg code in python
                            
                                Searching tool/guide to read a 1998 file with "magic number" SFS
                            
                                Is there any .pas file (Delphi or Pascal) parser?
                            
                                Parsing Strings with JavaCC
                            
                                Python - How to determine hierarchy level of parsed XML elements?
                            
                                Boost spirit can handle Postscript/PDF like languages?
                            
                                how to trigger angular parsers without inputing anything in the field
                            
                                Is this incremental parser a functor, if so how would `fmap` be implemented?
                            
                                How to calculate FIRST sets by hand
                            
                                How to parse a date with timezone correctly?
                            
                                beautifulsoup and invalid html document
                            
                                Types for parser combinators
                            
                                How to parse 0000-00-00 00:00:00?
                            
                                How to convert a string to a specific DateTime format in c#?
                            
                                Parsing string interpolation in ANTLR
                            
                                Abort HTMLParser processing in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With