Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regex to exclude a sentence which contains a specific word in java

Tags:

java

regex

I am reading a file which contains lots of information like shown below:

    type dw_3 from u_dw within w_pg6p0012_01
    boolean visible = false
    integer x = 1797
    integer y = 388
    integer width = 887
    integer height = 112
    integer taborder = 0
    boolean bringtotop = true
    string dataobject = "d_pg6p0012_14"
    end type

    type dw_3 from u_dw within w_pg6p0012_01
    integer x = 1797
    integer y = 388
    integer width = 887
    integer height = 112
    integer taborder = 0
    boolean bringtotop = true
    string dataobject = "d_pg6p0012_14"
    end type

I made regex :(?i)type dw_\d\s+(.*?)\s+within(.*?)\s+(?!boolean visible = false)(.*) I want to extract all the strings which do not contain "boolean visible = false" but mine one is returning all. I also tried many similar posts on stack but the result is similar to mine, please suggest a way.

solution :(?i)type\\s+dw_(\\d+|\\w+)\\s+from\\s+.*?within\\s+.*?\\s+(string|integer)?\\s+.*\\s+.*\\s+.*\\s+.*?\\s+.*?\\s+.*?\\s*string\\s+dataobject\\s+=\\s+(.*?)\\s+end\\s+type")

This is working well on regex checker but when i tried it on java it keep on running without giving any output

like image 580
SOP Avatar asked Jan 04 '17 05:01

SOP


People also ask

How do you exclude words in regex?

To represent this, we use a similar expression that excludes specific characters using the square brackets and the ^ (hat). For example, the pattern [^abc] will match any single character except for the letters a, b, or c.

How do I negate a string in regex?

Similarly, the negation variant of the character class is defined as "[^ ]" (with ^ within the square braces), it matches a single character which is not in the specified or set of possible characters. For example the regular expression [^abc] matches a single character except a or, b or, c.

What does \b mean in regex Java?

In Java, "\b" is a back-space character (char 0x08 ), which when used in a regex will match a back-space literal.

How do you match everything except a word in regex?

How do you ignore something in regex? To match any character except a list of excluded characters, put the excluded charaters between [^ and ] . The caret ^ must immediately follow the [ or else it stands for just itself.


2 Answers

You can use this RegEx

(\s*boolean visible = false)|(.*)

DEMO

This basically defines 2 capture groups

  1. First capture group (\s*boolean visible = false) will catch boolean visible = false.

  2. Second Capture group (.*) will capture everything else except all that's capture by first capture group.

Now when you're extracting it, just capture second group and ignore first one.


Edit

Here's an example for clarification:

In this example,

  • getOriginalFileContents() method gets the content of the file as shown in the program.
  • Notice how we're getting both the groups, but ignoring the first group and printing only the second one.

See the output, which is without that line boolean visible = false.

Output

 type dw_3 from u_dw within w_pg6p0012_01
 integer x = 1797
 integer y = 388
 integer width = 887
 integer height = 112
 integer taborder = 0
 boolean bringtotop = true
 string dataobject = "d_pg6p0012_14"
 end type


 type dw_3 from u_dw within w_pg6p0012_01
 integer x = 1797
 integer y = 388
 integer width = 887
 integer height = 112
 integer taborder = 0
 boolean bringtotop = true
 string dataobject = "d_pg6p0012_14"
 end type

Java Implementation

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexTut3 {

    public static void main(String args[]) {
        String file = getOriginalFileContents();
        Pattern pattern = Pattern.compile("(\\s*boolean visible = false)|(.*)");
        Matcher matcher = pattern.matcher(file);
        while (matcher.find()) {
            //System.out.print(matcher.group(1)); //ignore this group
            if (matcher.group(2) != null) System.out.println(matcher.group(2));
        }
    }

    //this method just get's the file contents as displayed in the
    //question. 
    private static String getOriginalFileContents() {
        String s = "     type dw_3 from u_dw within w_pg6p0012_01\n" +
            "     boolean visible = false\n" +
            "     integer x = 1797\n" +
            "     integer y = 388\n" +
            "     integer width = 887\n" +
            "     integer height = 112\n" +
            "     integer taborder = 0\n" +
            "     boolean bringtotop = true\n" +
            "     string dataobject = \"d_pg6p0012_14\"\n" +
            "     end type\n" +
            "     \n" +
            "     type dw_3 from u_dw within w_pg6p0012_01\n" +
            "     integer x = 1797\n" +
            "     integer y = 388\n" +
            "     integer width = 887\n" +
            "     integer height = 112\n" +
            "     integer taborder = 0\n" +
            "     boolean bringtotop = true\n" +
            "     string dataobject = \"d_pg6p0012_14\"\n" +
            "     end type";

        return s;
    }
}
like image 29
Raman Sahasi Avatar answered Oct 03 '22 03:10

Raman Sahasi


It will be much easier (and more readable) if you make a regex to match "boolean visible = false" and then exclude those lines that contain a match for it.

Pattern pattern = Pattern.compile("boolean visible = false");

Files.lines(filepath)
     .filter(line -> !pattern.matcher(line).find())  // note the "!"
     .forEach(/* do stuff */);

Notes:

  • Because we are using Files#lines(String), it is not necessary to break apart separate lines in the regex. This is already done for us.
  • The Matcher#find() method returns whether the given character sequence contains a match for the regex anywhere in it. I believe this is what you want.

EDIT:

Now, if you are just really intent on using a pure regex, then try this:

^((?!boolean visible = false).)+$

This will match an entire (non-empty) line if-and-only-if it does not contain "boolean visible = false" anywhere within it. No fancy backreferences / capture group semantics needed to extract the desired text.

See proof by unit tests here: https://regex101.com/r/dbzdMB/1


EDIT #2:

Alternatively, if all you are trying to do is to get the file text without any "boolean visible = false", then you could simply replace every instance of that target string with the empty string.

Pattern pattern = Pattern.compile("boolean visible = false");
Matcher matcher = pattern.matcher(fileAsCharSequence);  // e.g. StringBuilder
String output = matcher.replaceAll("");
like image 145
Travis Avatar answered Oct 03 '22 04:10

Travis