Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I handle contractions with regex word boundaries in javascript

I have a nodejs script that reads in a file and counts word frequencies. I currently feed each line into a function:

function getWords(line) {
    return line.match(/\b\w+\b/g);
}

This matches almost everything, except it misses contractions

getWords("I'm") -> {"I", "m"}

However, I cannot just include apostrophes, as I would want matched apostrophes to be word boundaries:

getWords("hey'there'") -> {"hey", "there"}

Is there a way capture contractions while still treating other apostrophes as word boundaries?

like image 779
Ehryk Avatar asked Dec 31 '14 02:12

Ehryk


Video Answer


2 Answers

The closest I believe you could get with regex would be line.match(/(?!'.*')\b[\w']+\b/g) but be aware that if there is no space between a word and a ', it will get treated as a contraction.

As Aaron Dufour mentioned, there would be no way for the regex by itself to know that I'm is a contraction but hey'there isn't.

See below:

enter image description here

like image 177
Wesley Smith Avatar answered Sep 20 '22 17:09

Wesley Smith


You can match letters and a possible apostrophe followed by letters.

line.match(/[A-Za-z]+('[A-Za-z]+)?/g
like image 22
kennebec Avatar answered Sep 18 '22 17:09

kennebec