Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count sentences in string with JavaScript

There are already a couple of similar questions:

  • Splitting textarea sentences into array and finding out which sentence changed on keyup()
  • JS RegEx to split text into sentences
  • Javascript RegExp for splitting text into sentences and keeping the delimiter
  • Split string into sentences in javascript

My situation is a bit different.

I need to count the number of sentences in a string.

The closest answer to what I need would be:

str.replace(/([.?!])\s*(?=[A-Z])/g, "$1|").split("|")

The only problem here is that this RegEx assumes a sentence starts with a capital letter, which may not always be the case.

To be more specific, I would define a sentence as:

  • Starting with a letter (capital or not), a number or even a symbol (such as $ or €).
  • Ending with a punctuation sign, such as a " . ", a " ? " or a " ! ".

However, if a sentence contains a number, which itself contains a " . " or a " , ", then the sentence should be considered as one sentence and not two.

Last but not least, we can assume that, except the first sentence, a sentence is preceded by a space.

Given a random string, how can I count the number of sentences it contains with Javascript (or CoffeeScript for that matter)?

like image 656
Thibaud Clement Avatar asked Feb 05 '16 02:02

Thibaud Clement


3 Answers

One regex to solve your problem is:

\w[.?!](\s|$)

The parts are as follows:

\w - Word character
\[.?!] - Punctuation as specified.
(\s|$) - Whitespace character OR the end of the string.

You may be able to use a character class instead of a group:

[\s|$]

For the final element, but that isn't working on https://regex101.com/.

Tested on the following:

Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going through the cites of the word in classical literature, discovered the undoubtable source. Lorem Ipsum comes from sections 1.10.32 and 1.10.33 of "de Finibus Bonorum et Malorum" (The Extremes of Good and Evil) by Cicero, written in 45 BC. This book is a treatise on the theory of ethics, very popular during the Renaissance. The first line of Lorem Ipsum, "Lorem ipsum dolor sit amet..", comes from a line in section 1.10.32.

And finds six sentences (bolded the end of sentences, not the actual match). Note that the different grouping might pose a problem if you're depending on it for any reason.

like image 86
Aaron W. Avatar answered Sep 27 '22 00:09

Aaron W.


I figured out a much simpler solution.

let text = text + " ";
const count = text.split(". ").length - 1;
console.log(count);
like image 30
Jens Törnell Avatar answered Sep 23 '22 00:09

Jens Törnell


This works if you have a single char at the end of a sentence in a string.

const text = ""; //insert your string here
const re = /[.!?]/;
const numOfSentences = text.split(re);
console.log(numOfSentences.length - 1);
like image 23
Djordje Djordjevic Avatar answered Sep 24 '22 00:09

Djordje Djordjevic