I'm having some issues with making the following regex work. I would like the following string:
"Please enter your name here"
to result in an array with the following elements:
'please enter', 'enter your', 'your name', 'name here'
Currently, I'm using the following pattern, and then creating a matcher and iterating in the following way:
Pattern word = Pattern.compile("[\w]+ [\w]+");
Matcher m = word.matcher("Please enter your name here");
while (m.find()) {
wordList.add(m.group());
}
But the result I'm getting is:
'please enter', 'your name'
What am I doing wrong? (P.s., i checked the same regex on regexpal.com and had the same problem). It seems like the same word won't be matched twice. What can I do to achieve the result I want?
Thanks.
---------------------------------
EDIT: Thanks for all the suggestions! I ended up doing this (because it adds flexibility in being able to easily specify number of "n-grams"):
Integer nGrams = 2;
String patternTpl = "\\b[\\w']+\\b";
String concatString = "what is your age? please enter your name."
for (int i = 0; i < nGrams; i++) {
// Create pattern.
String pattern = patternTpl;
for (int j = 0; j < i; j++) {
pattern = pattern + " " + patternTpl;
}
pattern = "(?=(" + pattern + "))";
Pattern word = Pattern.compile(pattern);
Matcher m = word.matcher(concatString);
// Iterate over all words and populate wordList
while (m.find()) {
wordList.add(m.group(1));
}
}
This results in:
Pattern:
(?=(\b[\w']+\b)) // In the first iteration
(?=(\b[\w']+\b \b[\w']+\b)) // In the second iteration
Array:
[what, is, your, age, please, enter, your, name, what is, is your, your age, please enter, enter your, your name]
Note: Got the pattern from the following top answer: Java regex skipping matches
The matches can't overlap, which explains your result. Here's a potential workaround, making use of capturing groups with a positive lookahead:
Pattern word = Pattern.compile("(\\w+)(?=(\\s\\w+))");
Matcher m = word.matcher("Please enter your name here");
while (m.find()) {
System.out.println(m.group(1) + m.group(2));
}
Please enter enter your your name name here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With