Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dart: Is there a way to split strings into sentences without using Dart's split method?

Tags:

flutter

dart

I'm looking to split a paragraph of text into individual sentences using Dart. The problem I am having is that sentences can end in a number of punctuation marks (e.g. '.', '!', '?') and in some cases (such as the Japanese language), sentences can end in unique symbols (e.g. '。').

Additionally, Dart's split method removes the split value from the string. For example, 'Hello World!" becomes "Hello World" when using the code text.split('! ');

I've looked around at Dart packages available but I'm unable to find anything that does what I'm looking for.

Ideally, I'm looking for something similar to BreakIterator in Java which allows the programmer to define which locale they wish to use when detecting punctuation and also maintains the punctuation mark when splitting the string into sentences. I'm happy to use a solution in Dart that doesn't automatically detect sentence endings based on Locale but if this isn't available I would like to have the ability to define all sentence endings to look for when splitting a string.

Any help is appreciated. Thank you in advance.

like image 302
Oliver Williams Avatar asked Oct 28 '25 10:10

Oliver Williams


1 Answers

it can be done using regex, something like this:

  String str1 = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. In vulputate odio eros, sit amet ultrices ipsum auctor sed. Mauris in faucibus elit. Nulla quam orci? ultrices a leo a, feugiat pharetra ex. Nunc et ipsum lorem. Integer quis congue nisi! In et sem eget leo ullamcorper consectetur dignissim vitae massa。Nam quis erat ac tellus laoreet posuere. Vivamus eget sapien eget neque euismod mollis.";

  // regular expression:
  RegExp re = new RegExp(r"(\w|\s|,|')+[。.?!]*\s*");

  // get all the matches:
  Iterable matches = re.allMatches(str1);

  //  Iterate all matches:
  for (Match m in matches) {
    String match = m.group(0);
    print("match: $match");
  }

output:

// match: Lorem ipsum dolor sit amet, consectetur adipiscing elit. 
// match: In vulputate odio eros, sit amet ultrices ipsum auctor sed. 
// match: Mauris in faucibus elit. 
// match: Nulla quam orci? 
// match: ultrices a leo a, feugiat pharetra ex. 
// match: Nunc et ipsum lorem. 
// match: Integer quis congue nisi! 
// match: In et sem eget leo ullamcorper consectetur dignissim vitae massa。
// match: Nam quis erat ac tellus laoreet posuere. 
// match: Vivamus eget sapien eget neque euismod mollis.
like image 98
Badjio Avatar answered Oct 31 '25 02:10

Badjio



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!