In Dart, I would like to split a string using a regular expression and include the matching delimiters in the resulting list. So with the delimiter .
, I want the string 123.456.789
to get split into [ 123, ., 456, ., 789 ]
.
In some languages, like C#, JavaScript, Python and Perl, according to https://stackoverflow.com/a/15668433, this can be done by simply including the delimiters in capturing parentheses. The behaviour seems to be documented at https://ecma-international.org/ecma-262/9.0/#sec-regexp.prototype-@@split.
This doesn't seem to work in Dart, however:
print("123.456.789".split(new RegExp(r"(\.)")));
yields exactly the same thing as without the parentheses. Is there a way to get split()
to work like this in Dart? Otherwise I guess it will have to be an allMatches()
implementation.
Edit: Putting ((?<=\.)|(?=\.))
for the regex apparently does the job for a single delimiter, with lookbehind and lookahead. I will actually have a bunch of delimiters, and I'm not sure about efficiency with this method. Can someone advise if it's fine? Legibility is certainly reduced: to allow delimiters .
and ;
, would one need
((?<=\.)|(?=\.)|(?<=;)(?=;))
or
((?<=\.|;)|(?=\.|;)
.
Testing
print("123.456.789;abc;.xyz.;ABC".split(new RegExp(r"((?<=\.|;)|(?=\.|;))")));
indicates that both work.
There is no direct support for it in the standard library, but it is fairly straightforward to roll your own implementation based on RegExp.allMatches()
. For example:
extension RegExpExtension on RegExp {
List<String> allMatchesWithSep(String input, [int start = 0]) {
var result = <String>[];
for (var match in allMatches(input, start)) {
result.add(input.substring(start, match.start));
result.add(match[0]);
start = match.end;
}
result.add(input.substring(start));
return result;
}
}
extension StringExtension on String {
List<String> splitWithDelim(RegExp pattern) =>
pattern.allMatchesWithSep(this);
}
void main() {
print("123.456.789".splitWithDelim(RegExp(r"\.")));
print(RegExp(r" ").allMatchesWithSep("lorem ipsum dolor sit amet"));
}
Given your initial string:
123.456.789
And expected results (split on and including delimiters):
[123, ., 456, ., 789]
You can come up with the following regex:
(?!^|$)\b
Matches locations that match a word boundary, except for the start/end of the line.
Now for your edit, given the following string:
123.456.789;abc;.xyz.;ABC
You'd like the expected results (split on and including multiple delimiters):
[123, ., 456, ., 789, ;, abc, ;, ., xyz, ., ;, ABC]
You can use the following regex (adapted from first - added alternation):
See regex sample here (I simulate split by using substitution with newline character for display purposes).
Either of the following work.
(?!^|$)\b|(?!\w)\B(?!\w)
(?!^|$)\b|(?=\W)\B(?=\W)
# the long way (with case-insensitive matching) - allows underscore _ as delimiter
(?!^|$)(?:(?<=[a-z\d])(?![a-z\d])|(?<![a-z\d])(?=[a-z\d])|(?<![a-z\d])(?![a-z\d]))
Matches locations that match a word boundary, except for the start/end of the line; or matches a location that doesn't match a word boundary, but is preceded by or followed by a non-word character.
Note: This will work in Dart 2.3.0 and up since lookbehind support was added (see here for more info).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With