I have a nodejs script that reads in a file and counts word frequencies. I currently feed each line into a function:
function getWords(line) {
return line.match(/\b\w+\b/g);
}
This matches almost everything, except it misses contractions
getWords("I'm") -> {"I", "m"}
However, I cannot just include apostrophes, as I would want matched apostrophes to be word boundaries:
getWords("hey'there'") -> {"hey", "there"}
Is there a way capture contractions while still treating other apostrophes as word boundaries?
The closest I believe you could get with regex would be line.match(/(?!'.*')\b[\w']+\b/g)
but be aware that if there is no space between a word and a '
, it will get treated as a contraction.
As Aaron Dufour mentioned, there would be no way for the regex by itself to know that I'm
is a contraction but hey'there
isn't.
See below:
You can match letters and a possible apostrophe followed by letters.
line.match(/[A-Za-z]+('[A-Za-z]+)?/g
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With