I need to check for lines that have either one of the following patterns:
preposition word ||| other words or what ever
word preposition ||| other words or what ever
the preposition may be one of any word in a list like {de, à, pour, quand, ...} the word may be a preposition or not.
I tried many patterns,like the following
File file = new File("test.txt");
Pattern pattern = Pattern.compile("(\\bde\\b|\\bà\\b) \\w.*",Pattern.CASE_INSENSITIVE);
String fileContent = readFileAsString(file.getAbsolutePath());
Matcher match = pattern.matcher(fileContent);
System.out.println( match.replaceAll("c"));
This pattern match a preposition followed by at least one word before the pipe. What I want is to match a preposition followed by just one word before the pipe. I tried the following pattern
Pattern pattern = Pattern.compile("(\\bde\\b|\\bla\\b)\\s\\w\\s\\|.*",Pattern.CASE_INSENSITIVE);
Unfortunately, this pattern doesn't work!
For the sake of conciseness, I'm just going to use prep to stand in as a preposition that we could be dealing with:
Pattern pattern = Pattern.compile("(?:(?:\\bprep\\b \\w+)|(?:\\w+ \\bprep\\b)).*",
Pattern.CASE_INSENSITIVE);
(?:...) says to group but do not capture
\\bprep\\b ensures that prep is matched only if it is alone, ie it won't match is for preposition
\\w+ demands 1 or more [a-zA-Z_0-9]
.* at the end goes with both of the sets of parentheses
EDIT (in response to comment):
"^(?:(?:\\bprep\\b \\w+)|(?:\\w+ \\bprep\\b)).*" is working, you're just most likely running into the case where you have something like:
String myString = "hello prep someWord mindless nonsense";
This will match since this is captured by the second case: (?:\\w+ \\bprep\\b)).*.
If you try these, you'll see that the ^ is in fact working:
String myString = "egeg prep rfb tgnbv";
This doesn't match the second case since there are 2 spaces after "egeg", so it can only match the first, but it does not due to the ^. Additionally:
String myString = "egeg hello prep rfb tgnbv";
We've established that a case like this won't match the first, and it also won't match the second, meaning that the ^ is in fact working.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With