Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex with java

Tags:

java

regex

I need to check for lines that have either one of the following patterns:

preposition word ||| other words or what ever
word preposition ||| other words or what ever

the preposition may be one of any word in a list like {de, à, pour, quand, ...} the word may be a preposition or not.

I tried many patterns,like the following

File file = new File("test.txt");   
Pattern pattern = Pattern.compile("(\\bde\\b|\\bà\\b) \\w.*",Pattern.CASE_INSENSITIVE);          
String fileContent = readFileAsString(file.getAbsolutePath());           
Matcher match = pattern.matcher(fileContent);
System.out.println( match.replaceAll("c"));

This pattern match a preposition followed by at least one word before the pipe. What I want is to match a preposition followed by just one word before the pipe. I tried the following pattern

Pattern pattern = Pattern.compile("(\\bde\\b|\\bla\\b)\\s\\w\\s\\|.*",Pattern.CASE_INSENSITIVE);

Unfortunately, this pattern doesn't work!

like image 328
Dorra Avatar asked Jun 12 '26 03:06

Dorra


1 Answers

For the sake of conciseness, I'm just going to use prep to stand in as a preposition that we could be dealing with:

Pattern pattern = Pattern.compile("(?:(?:\\bprep\\b \\w+)|(?:\\w+ \\bprep\\b)).*",
                                 Pattern.CASE_INSENSITIVE);    

(?:...) says to group but do not capture
\\bprep\\b ensures that prep is matched only if it is alone, ie it won't match is for preposition
\\w+ demands 1 or more [a-zA-Z_0-9]
.* at the end goes with both of the sets of parentheses

EDIT (in response to comment):
"^(?:(?:\\bprep\\b \\w+)|(?:\\w+ \\bprep\\b)).*" is working, you're just most likely running into the case where you have something like:

String myString = "hello prep someWord mindless nonsense";

This will match since this is captured by the second case: (?:\\w+ \\bprep\\b)).*.

If you try these, you'll see that the ^ is in fact working:

String myString = "egeg  prep rfb tgnbv";

This doesn't match the second case since there are 2 spaces after "egeg", so it can only match the first, but it does not due to the ^. Additionally:

String myString = "egeg hello prep rfb tgnbv";

We've established that a case like this won't match the first, and it also won't match the second, meaning that the ^ is in fact working.

like image 64
Steve P. Avatar answered Jun 13 '26 17:06

Steve P.



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!