I am trying to read in a text file from user input using scanner and delimit the words in the file with certain cases. One of the cases that the words must be delimited by is when a word has an apostrophe in the begging or end but should not affect apostrophes within words. For example: if scanner sees a word such as 'tis the scanner.useDlimeter() should be able to take off the apostrophe and leave the word "tis" but if it sees a word like "don't" then it should leave the word as is.
I am using a regex expression to cover the multiple cases that the delimiter should delimit the words by. The regex is doing what I need but for some reason, my results are printing out an extra space before words that have a space and then an apostrophe in the front of a word. I am new to regex and I don't know how to fix this problem but any suggestions would be greatly appreciated.
Below are the words in my text file:
'Twas the night before christmas! But don't open your presents. 'Tis the only way to celebrate.
Code:
public static void main (String[] args){
Pattern p = Pattern.compile("[\\p{Punct}\\s&&[^']]+|('(?![\\w]))+|((?<![\\w])')+");
System.out.println("Please enter a text file name.");
Scanner sc = new Scanner(System.in);
File file = new File(sc.nextLine());
Scanner nSc = new Scanner(file);
nSc.useDelimiter(p);
while (nSc.hasNext()){
String word = nSc.next().toLowerCase();
System.out.println(word);
}
nSc.close();
}
Expected:
twas
the
night
before
christmas
but
don't
open
your
presents
tis
the
only
way
to
celebrate
Actual:
twas
the
night
before
christmas
but
don't
open
your
presents
tis
the
only
way
to
celebrate
You can use the regex, '?\b\w+'?\w+\b to grab the desired words from teh string and then replace the regex, '(.*) with $1 where $1 specifies group(1).
import java.util.List;
import java.util.regex.Pattern;
import java.util.stream.Collectors;
public class Main {
public static void main(String[] args) {
String str = "'Twas the night before christmas! But don't open your presents. 'Tis the only way to celebrate.";
List<String> list = Pattern.compile("'?\\b\\w+'?\\w+\\b")
.matcher(str)
.results()
.map(r->r.group().replaceAll("'(.*)", "$1"))
.collect(Collectors.toList());
System.out.println(list);
}
}
Output:
[Twas, the, night, before, christmas, But, dont, open, your, presents, Tis, the, only, way, to, celebrate]
Explanation of the regex, '?\b\w+'?\w+\b:
\b specifies word boundary.\w+ specifies one or more word character.'? specifies optional 'If you are not familiar with Stream API, you can do it as follows:
Scanner nSc = new Scanner(file);
while (nSc.hasNextLine()) {
String line = nSc.nextLine().toLowerCase();
Pattern pattern = Pattern.compile("'?\\b\\w+'?\\w+\\b");
Matcher matcher = pattern.matcher(line);
while (matcher.find()) {
String word = matcher.group();
System.out.println(word.replaceAll("'(.*)", "$1"));
}
}
nSc.close();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With