Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Partial Matching of Regular Expressions

Tags:

java

regex

In NFA it is easy to make all previously non-final states accepting to make it match language of all substrings of a given language.

In Java regex engine, is there a way to find out if a string is a starting substring of a string that matches given regex?

regexX = "any start of", regexA - any given regex

"regexXregexA" resulting expression matches all substrings of matches "regexA":

example:

regexA = a*b

"a" matches

"regexXa*b"

because it is a start of "ab" (and "aab")
edit:

Since some people still fail to understand, here is a program test for this question:

import java.util.regex.*;
public class Test1 {
    public static void main(String args[]){
       String regex = "a*b";
       System.out.println(
       partialMatch(regex, "aaa");
       );
     }
public boolean partialMatch(String regex, String begining){
//return true if there is a string which matches the regex and    
//startsWith(but not equal) begining, false otherwise 
}
}

Results in true.

like image 826
iantonuk Avatar asked Feb 06 '17 17:02

iantonuk


People also ask

What is a partial match?

A partial match is one that matched one or more characters at the end of the text input, but did not match all of the regular expression (although it may have done so had more input been available).

What is partial string matching?

(A partial match occurs if the whole of the element of x matches the beginning of the element of table .) Finally, all remaining elements of x are regarded as unmatched. In addition, an empty string can match nothing, not even an exact match to an empty string.

What is regular expression matching?

Regular expressions are patterns used to match character combinations in strings. In JavaScript, regular expressions are also objects. These patterns are used with the exec() and test() methods of RegExp , and with the match() , matchAll() , replace() , replaceAll() , search() , and split() methods of String .

What is a partial search?

A partial term search refers to queries consisting of term fragments, where instead of a whole term, you might have just the beginning, middle, or end of term (sometimes referred to as prefix, infix, or suffix queries).


2 Answers

What you're looking for is called partial matching, and it's natively supported by the Java regex API (for the record, other engines which offer this feature include PCRE and boost::regex).

You can tell if an input string matched partially by inspecting the result of the Matcher.hitEnd function, which tells if the match failed because the end of the input string was reached.

Pattern pattern = Pattern.compile("a*b");
Matcher matcher = pattern.matcher("aaa");
System.out.println("Matches: " + matcher.matches());
System.out.println("Partial match: " + matcher.hitEnd());

This outputs:

Matches: false
Partial match: true
like image 102
Lucas Trzesniewski Avatar answered Oct 06 '22 19:10

Lucas Trzesniewski


In NFA it is easy to make all previously non-final states accepting to make it match language of all substrings of a given language.

Indeed, it can be accomplished by adding a new final state and an ε-move from each state (final or non-final) to the new final state.

Afaik there is no regex equivalent for this operation.

It is possible that some regex libraries provides a way to verify if a string is a partial match of a regex, I don't know. I don't know Java, I work mainly in PHP and it doesn't provide such a feature. Maybe there are libraries that does it but I never needed one.

For a small, specific regex you can try to build a new regex that matches strings that would partially match the original regex by combining this simple rules:

  • a -> a?
  • ab -> ab?
  • a* -> a*
  • a+ -> a*
  • a|b -> (a|b)?
  • etc

a and b above are sub-regexps of the original regex. Use parentheses as needed.

like image 22
axiac Avatar answered Oct 06 '22 19:10

axiac