Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Detecting variables through a string

I am creating a simple IDE using JTextPane and detecting keywords and coloring them.

Currently, I am able to detect:

  1. Comments
  2. String Literals
  3. Integers & Floats
  4. Keywords

The way I detect these types are through regular expressions.

Now, I am trying to detect variables like [int x = 10;] and coloring them a different color.

Currently, I am able to get all data types like int, float char using the following regex:

Pattern words = Pattern.compile(\\bint\\b|\\bfloat\\b\\bchar\\b);
Matcher matcherWords = words.matcher(code);
while (matcherWords.find()) {
    System.out.print(code.substring(matcherWords.start(), matcherWords.end());
    // How to get next word that is a variable?
}

Below is a sample output of my program:

enter image description here

How am I able to detect variables like a, b, c after I can detect int, float, etc?

like image 672
user3188291 Avatar asked Jul 30 '15 18:07

user3188291


People also ask

How do you check if a variable matches a string?

To check if a variable contains a value that is a string, use the isinstance built-in function. The isinstance function takes two arguments. The first is your variable. The second is the type you want to check for.

How do you find the variable name of a string?

locals() - Return a dictionary containing the current scope's local variables. by iterating through this dictionary we can check the key which has a value equal to the defined variable, just extracting the key will give us the text of variable in string format.

How do you test if a variable is a string in Python?

Method #1 : Using isinstance(x, str) This method can be used to test whether any variable is a particular datatype. By giving the second argument as “str”, we can check if the variable we pass is a string or not.

How do you reference a variable in a string in Python?

Python – Variables in String Now, we can reference variables inside this string. All we need to do is enclose the variables with curly braces {variable} and place this variable inside the string value, wherever required. An example is given below.


1 Answers

Try this one:

(?:(?<=int|float|String|double|char|long)(?:\s+[a-zA-Z_$][\w$]*\s*)|(?<=\G,)(?:\s*[a-zA-Z_$][\w$]*\s*))(?=,|;|=)

which means:

  • (?<=int|float|String|double|char|long) - positive lookbehind searching for variable type,
  • (?:\s+[a-zA-Z_$][\w$]*\s*) - non capturing group: at least one space, followed by valid characters for Java variables, followed by zero or more spaces
  • | - or; alternative between maching name after var. type or after comma,
  • (?<=\G,) - positive lookbehind for previous match and comma (because other parts match spaces from both sides)
  • (?:\s*[a-zA-Z_$][\w$]*\s*) - non capturing group: at least one space, followed by valid characters for Java variables, followed by zero or more spaces
  • (?=,|;|=) - positive lookahead for comma, equal sign or semi-colon

it use a \G boundary matching (The end of the previous match), so the alternative, which search names between other names (words beetween spaces or/and commas exactly), will match only if it is after previous match. So it will not match every word beetween commas in Strings for example. Also I added $ in [a-zA-Z_$][\w$]* as it is allowed in variable names however not recommended.

DEMO

And for Java:

 Pattern pattern = Pattern.compile("(?:(?<=int|float|String|double|char|long)(?:\\s+[a-zA-Z_$][\\w$]*\\s*)|(?<=\\G,)(?:\\s*[a-zA-Z_$][\\w$]*\\s*))(?=,|;|=)");

EDIT

You can use (int |float |...) to match names of variables directly using matcher.start() and matcher.end() without spaces, however I would rather use (?:\s*) in every place where space can ocour and then check for redundant spaces during data process, because you never know how much spaces will user type (of course more than one is redundant, but it is still valid!).

Another approuch would be to match spaces but use groups, like:

(?:(?<=int|float|String|double|char|long)(?:\s+)([a-zA-Z_$][\w$]*)(?:\s*)|(?<=\G,)(?:\s*)([a-zA-Z_$][\w$]*)(?:\s*))(?=,|;|=)

DEMO

names are without spaces, but you need to extract them from groups 1 & 2 by matcher.start(group no) and matcher.end(group no).

EDIT2 answer to question from comment

It depends what you want to achieve. If you just want to get variables as Strings, it is enough to use mathod trim() but if you want to get start and end indices of variables in text, to for example highlight it in different colour, it will be better to use for example matcher.start(1) to extract start index of group 1. Consider this example:

import java.io.IOException; import java.util.regex.Matcher; import java.util.regex.Pattern;

public class Test {
    public static void main(String[] args) throws IOException {
        String      text = "int a = 100;\n" +
                "float b = 100.10;\n" +
                "double c - 12.454545645;\n" +
                "long longest dsfsf = 453543543543;\n" +
                "a = d;\n" +
                "char     b = 'a';\n" +
                "String str = \"dfssffdsdfsd\"\n" +
                "int d,f,g;\n" +
                "int a,f,frhg = 0;\n" +
                "String string = \"a,b,c,d,e,f\"";

        Pattern pattern = Pattern.compile("(?:(?<=int|float|String|double|char|long)(?:\\s+)([a-zA-Z_$][\\w$]*)(?:\\s*)|(?<=\\G,)(?:\\s*)([a-zA-Z_$][\\w$]*)(?:\\s*))(?=,|;|=)");
        Matcher matcher = pattern.matcher(text);
        while(matcher.find()){
            System.out.println("trim(): " + text.substring(matcher.start(),matcher.end()).trim()); // cut off spaces by trim() method;

            int group = (matcher.group(1)==null)? 2 : 1; // check which group captured string;
            System.out.println("group(" + group + "): \n\t"  // to extract string by group capturing;
                    + text.substring(matcher.start(group),matcher.end(group))
                    + ",\n\tsubstring(" + matcher.start(group) + "," + matcher.end(group)+")");

        }
    }
}

the output present two approches.

like image 61
m.cekiera Avatar answered Sep 21 '22 12:09

m.cekiera