Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Identify type of regex pattern

In my application I am adding edittext based on response provided by server. For each edittext server also provides regex pattern to match. I am able to successfully able to match pattern and do validations. But I want to identify type of regex pattern so that I can open keyboard according to value edittext should accept.

For example, If edittext should accept email address then keyboard with @ sign opens up and if edittext accept numeric values it should open numeric keypad.

Is there any library which can return type from its regex pattern such as "Email", "Number" etc. from regular expressions as there can be several different types of regex pattern?

EDIT: I know how to set input type for edittext but I need to find out type from regex pattern. I am not able to make changes in server I have to handle this on client side.

like image 542
baldguy Avatar asked Aug 11 '15 15:08

baldguy


People also ask

What is the type of regex?

There are also two types of regular expressions: the "Basic" regular expression, and the "extended" regular expression.

What are regex patterns?

A regular expression is a pattern that the regular expression engine attempts to match in input text. A pattern consists of one or more character literals, operators, or constructs.

What does '$' mean in regex?

$ means "Match the end of the string" (the position after the last character in the string).

How do you identify a regular expression?

A regex (regular expression) consists of a sequence of sub-expressions. In this example, [0-9] and + . The [...] , known as character class (or bracket list), encloses a list of characters. It matches any SINGLE character in the list.


1 Answers

There most probably isn't one. The reason is - there is no way to tell for sure. Anything you can come up with will be heuristic.

Heuristic one:

  1. If the pattern looks for something containing a dot, followed by @ sign, followed by something containing a dot - it's an email validation.
  2. If the pattern contains only \d, or number ranges ([1-5]), or single numbers (7) plus repetition meta characters (?, *, +, {4, 12}), it's a number validation.
  3. If the pattern contains \w and no @ sign, it's a regular text.
  4. Continue in the same spirit.

    • + high control. You can always add new guesses when you see that your results aren't accurate in some case
    • - requires more code to implement
    • - requires very good knowledge of regexes

Heuristic two:

Use a list of strings, which you know the type of and try to match them with the regex. Aka, for emails try [email protected].

  • + easy to implement. Small chance problematic logic
  • - least amount of control. If the server is giving you email patterns for different domains you can't guess that this is an email pattern, unless you know all possible domains

Heuristic three:

Use a library that can generate example strings from regex and match them with your own regexes to determine the type. Here is one for Java and another one for JavaScript.

  • + gives a good combination of high control and easy implementation
  • - you still have to write your own regexes (not as trivial as the 2nd heuristic)
  • - people sometimes write regexes that allow some false positives. Therefore, generated strings might not be in the perfect format (not as much control as the 1st heuristic)


Are the regexes static?

  • If yes - you should make a mapping and use that.
  • If no - use a heuristic like one of the above and improve it over time as you gain more statistics about how the generated regexes usually look.
like image 189
ndnenkov Avatar answered Oct 19 '22 10:10

ndnenkov