Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to form a regex to recognize correct declaration of variable names [closed]

Tags:

java

regex

I would like to form a regex ro recognize the declaration of a variable name. User will enter a string that they would like as a variable name, and the program has to check whether the variable is valid.

  1. The first character of the variable name must either be alphabet or underscore. It should not start with the digit.
  2. No commas and blanks are allowed in the variable name.
  3. No special symbols other than underscore are allowed in the variable name.

I have been trying for the whole day and couldn't get the correct one.

like image 640
TK Cheah Avatar asked Jul 10 '13 06:07

TK Cheah


People also ask

How do you name a variable in regEx?

The first character of the variable name must either be alphabet or underscore. It should not start with the digit. No commas and blanks are allowed in the variable name. No special symbols other than underscore are allowed in the variable name.

What is ?: In regEx?

It indicates that the subpattern is a non-capture subpattern. That means whatever is matched in (?:\w+\s) , even though it's enclosed by () it won't appear in the list of matches, only (\w+) will.

How do you check if a regEx is valid or not?

new String(). matches(regEx) can be directly be used with try-catch to identify if regEx is valid. While this does accomplish the end result, Pattern. compile(regEx) is simpler (and is exactly what will end up happening anyway) and doesn't have any additional complexity.

Can I use a variable in a regEx?

If we try to pass a variable to the regex literal pattern it won't work. The right way of doing it is by using a regular expression constructor new RegExp() . In the above code, we have passed the removeStr variable as an argument to the new RegExp() constructor method.


3 Answers

First thing we do is gather a list of all the valid characters for the first character:

[a-zA-Z_$]

Then the other characters:

[a-zA-Z_$0-9]

we want to match the whole string, and we can have 0 or more of the other characters, so the regex becomes:

^[a-zA-Z_$][a-zA-Z_$0-9]*$

I allow capital characters in the first character in the regex (as well as dollar signs), because this is a test for validity, not for well-formed variables. (Note that constants should be in all caps, including the first letter.)

like image 165
Alex Gittemeier Avatar answered Oct 20 '22 10:10

Alex Gittemeier


You can use this:

"^[_a-z]\\w*$"

How it works:

^        // Match at the beginning
[_a-z]   // Match either "_", or "a-z" at the beginning 
\\w*     // Match zero or more of characters - [a-zA-Z0-9_], after the beginning
$        // Till the end

Note - according to Java Naming Convention, a variable should not start with an uppercase letter, so I have not included - [A-Z] in the first character class.

Also, since Java allows the use of $ in the variable name, even at the start, you should consider adding it to your allowed character set. So, you can modify the above regex as:

"^[_$a-z][\\w$]*$"
like image 25
Rohit Jain Avatar answered Oct 20 '22 09:10

Rohit Jain


This will do what you want:

"^[a-z_]\\w*$"

Explanation:

  • ^: start at the beginning of the string
  • [a-z_]: match a single lowercase letter or underscore
  • \\w*: match zero or more word characters (\w is equivalent to [a-zA-Z_0-9])
  • $: match until the end of the string.

Edit: Updated to reflect the "dollar sign allowed" and "name shouldn't start with uppercase" pointed out by the others. Thanks for the reminder.

Edit 2: After doing some research I've removed the matching for dollar signs again. While technically permissible, it's certainly bad style in this context and therefore discouraged, just like variables starting with an uppercase letter. See also https://stackoverflow.com/a/4636667/1814922

like image 3
creinig Avatar answered Oct 20 '22 08:10

creinig