Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dart: split string using regular expression and include delimiters

Tags:

regex

dart

In Dart, I would like to split a string using a regular expression and include the matching delimiters in the resulting list. So with the delimiter ., I want the string 123.456.789 to get split into [ 123, ., 456, ., 789 ].

In some languages, like C#, JavaScript, Python and Perl, according to https://stackoverflow.com/a/15668433, this can be done by simply including the delimiters in capturing parentheses. The behaviour seems to be documented at https://ecma-international.org/ecma-262/9.0/#sec-regexp.prototype-@@split.

This doesn't seem to work in Dart, however:

print("123.456.789".split(new RegExp(r"(\.)")));

yields exactly the same thing as without the parentheses. Is there a way to get split() to work like this in Dart? Otherwise I guess it will have to be an allMatches() implementation.

Edit: Putting ((?<=\.)|(?=\.)) for the regex apparently does the job for a single delimiter, with lookbehind and lookahead. I will actually have a bunch of delimiters, and I'm not sure about efficiency with this method. Can someone advise if it's fine? Legibility is certainly reduced: to allow delimiters . and ;, would one need ((?<=\.)|(?=\.)|(?<=;)(?=;)) or ((?<=\.|;)|(?=\.|;). Testing

print("123.456.789;abc;.xyz.;ABC".split(new RegExp(r"((?<=\.|;)|(?=\.|;))")));

indicates that both work.

like image 653
Ozzin Avatar asked Dec 31 '19 17:12

Ozzin


2 Answers

There is no direct support for it in the standard library, but it is fairly straightforward to roll your own implementation based on RegExp.allMatches(). For example:

extension RegExpExtension on RegExp {
  List<String> allMatchesWithSep(String input, [int start = 0]) {
    var result = <String>[];
    for (var match in allMatches(input, start)) {
      result.add(input.substring(start, match.start));
      result.add(match[0]);
      start = match.end;
    }
    result.add(input.substring(start));
    return result;
  }
}

extension StringExtension on String {
  List<String> splitWithDelim(RegExp pattern) =>
      pattern.allMatchesWithSep(this);
}

void main() {
  print("123.456.789".splitWithDelim(RegExp(r"\.")));
  print(RegExp(r" ").allMatchesWithSep("lorem ipsum dolor sit amet"));
}
like image 82
Reimer Behrends Avatar answered Oct 16 '22 19:10

Reimer Behrends


Splitting on single delimiter

Given your initial string:

123.456.789

And expected results (split on and including delimiters):

[123, ., 456, ., 789]

You can come up with the following regex:

(?!^|$)\b

Matches locations that match a word boundary, except for the start/end of the line.


Splitting on multiple delimiters

Now for your edit, given the following string:

123.456.789;abc;.xyz.;ABC

You'd like the expected results (split on and including multiple delimiters):

[123, ., 456, ., 789, ;, abc, ;, ., xyz, ., ;, ABC]

You can use the following regex (adapted from first - added alternation):

See regex sample here (I simulate split by using substitution with newline character for display purposes).

Either of the following work.

(?!^|$)\b|(?!\w)\B(?!\w)
(?!^|$)\b|(?=\W)\B(?=\W)

# the long way (with case-insensitive matching) - allows underscore _ as delimiter
(?!^|$)(?:(?<=[a-z\d])(?![a-z\d])|(?<![a-z\d])(?=[a-z\d])|(?<![a-z\d])(?![a-z\d]))

Matches locations that match a word boundary, except for the start/end of the line; or matches a location that doesn't match a word boundary, but is preceded by or followed by a non-word character.

Note: This will work in Dart 2.3.0 and up since lookbehind support was added (see here for more info).

like image 28
ctwheels Avatar answered Oct 16 '22 17:10

ctwheels