I faced this question in an interview: Given two regular expression, compute the edit distance between them. The edit distance being defined as the smallest edit distance between any two strings generated by the two regular expressions respectively. Formally, we are looking for <code>d(L1,L2) = min { d(x,y) | x from L1, y from L2 }</code>, where <code>L1</code> and <code>L2</code> are the languages generated by the two regular expressions. I was not able to solve it during interviews. Even now I don't have any clue how to solve it. Any ideas? I think this is same as http://www.spoj.com/problems/AMR10B/

There's finite state machines that represent the two languages. Let's say the first language has states S[1], S[2], ..., S[N1] and transitions c: S[i]->S[j] (meaning state i goes to state j under input character c), and T[1], T[2], ... T[N2] for the second language (with its own set of transitions). Now, you can construct the weighted multi-graph with nodes being pairs of states, and edges between pairs (S[i1], T[i2]) -> (S[j1], T[j2]) if any of these four cases hold: <ul> <li>There's c: S[i1] -> S[j1] and i2 = j2. This has weight 1</li> <li>There's c: T[i2] -> T[j2] and i1 = j1. This has weight 1</li> <li>There's c: S[i1] -> S[j1] and c: T[i2] -> T[j2]. This has weight 0</li> <li>There's c: S[i1] -> S[j1] and d: T[i2] -> T[j2]. This has weight 1</li> </ul> Then, finding the lowest weight path from the pair of start states to any pair of accepting states gives you the minimal edit distance.

Edit distance between two regular expression

Tags:

string

regex

algorithm

dynamic-programming

I faced this question in an interview:

Given two regular expression, compute the edit distance between them. The edit distance being defined as the smallest edit distance between any two strings generated by the two regular expressions respectively.

Formally, we are looking for d(L1,L2) = min { d(x,y) | x from L1, y from L2 }, where L1 and L2 are the languages generated by the two regular expressions.

I was not able to solve it during interviews. Even now I don't have any clue how to solve it. Any ideas?

I think this is same as http://www.spoj.com/problems/AMR10B/

386

asked Apr 30 '15 08:04

Quixotic

1 Answers

There's finite state machines that represent the two languages. Let's say the first language has states S[1], S[2], ..., S[N1] and transitions c: S[i]->S[j] (meaning state i goes to state j under input character c), and T[1], T[2], ... T[N2] for the second language (with its own set of transitions).

Now, you can construct the weighted multi-graph with nodes being pairs of states, and edges between pairs (S[i1], T[i2]) -> (S[j1], T[j2]) if any of these four cases hold:

There's c: S[i1] -> S[j1] and i2 = j2. This has weight 1
There's c: T[i2] -> T[j2] and i1 = j1. This has weight 1
There's c: S[i1] -> S[j1] and c: T[i2] -> T[j2]. This has weight 0
There's c: S[i1] -> S[j1] and d: T[i2] -> T[j2]. This has weight 1

Then, finding the lowest weight path from the pair of start states to any pair of accepting states gives you the minimal edit distance.

158

answered Oct 19 '22 20:10

Paul Hankin

Related questions
                            
                                Use regex to match a series of numbers 1-9, with no repetition
                            
                                Is there a DSL for writing Regular Expressions?
                            
                                How to terminate Matcher.find(), when its running too long?
                            
                                How to keep the delimiter while using RegEx?
                            
                                can't escape dot (.) at my mod_rewrite code
                            
                                Need RegExp help for Linux Bash grep command to filter out lines containing square brackets
                            
                                Techniques for extracting regular expressions out of a labeled data set
                            
                                RegEx to detect if a line doesn't end in a semi colon
                            
                                Regular Expression to match a quoted string embedded in another quoted string
                            
                                Change RegEx to allow for both English & Japanese characters
                            
                                How to check number of different characters using regex?
                            
                                Is there a program that allows me to design a Regex using a Finite State Machine Graph?
                            
                                mysql regex utf-8 characters
                            
                                Regex match 2 out of 4 groups
                            
                                Regular expression for items listed in plain english
                            
                                How to allow URLs to contain dots in ASP.NET MVC5?
                            
                                Extracting patterns from text in R
                            
                                Is there a RegEx to validate a Base32 :: RFC 3548
                            
                                Replace multiple capture groups using regexp with java
                            
                                Python Regex - replace a string not located between two specific words

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With