Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split using a prefix character using regular expressions?

I would like to split the example string:

~Peter~Lois~Chris~Meg~Stewie

on the character ~ and have the result be

Peter
Lois
Chris
Meg
Stewie

Using a standard string split function in javascript or C# the first result is of course an empty string. I'd like to avoid having to ignore the first result because the first result may actually be an empty string.

I've been fiddling around with using a regular expression and I'm stumped. I'm sure somebody has come across and elegant solution to this.

like image 656
Craig Avatar asked Feb 01 '09 02:02

Craig


People also ask

How do you split in regular expressions?

To split a string by a regular expression, pass a regex as a parameter to the split() method, e.g. str. split(/[,. \s]/) . The split method takes a string or regular expression and splits the string based on the provided separator, into an array of substrings.

What is a prefix for a regular expression?

A prefixed regular expression (PRE) is defined recursively. Empty set ø end empty string ""- are PREs. For each symbol a in alphabet, "a" is a PRE.

How do you split a string by the occurrences of a regex pattern?

split() method split the string by the occurrences of the regex pattern, returning a list containing the resulting substrings.

Does split accept regex?

You do not only have to use literal strings for splitting strings into an array with the split method. You can use regex as breakpoints that match more characters for splitting a string.


2 Answers

For your requirements, I see two options:

(1) Remove the initial prefix character, if present.

(2) Use a full regular expression to separate the string.

Both are illustrated in this code:

using System;
using System.Linq;
using System.Text.RegularExpressions;

class APP { static void Main() {

string s = "~Peter~Lois~Chris~Meg~Stewie";

// #1 - Trim+Split
Console.WriteLine ("[#1 - Trim+Split]");
string[] result = s.TrimStart('~').Split('~');
foreach (string t in result) { Console.WriteLine("'"+t+"'"); }

// #2 - Regex
Console.WriteLine ("[#2 - Regex]");
Regex RE = new Regex("~([^~]*)");
MatchCollection theMatches = RE.Matches(s);
foreach (Match match in theMatches) { Console.WriteLine("'"+match.Groups[1].Value+"'"); }

// #3 - Regex with LINQ [ modified from @ccook's code ]
Console.WriteLine ("[#3 - Regex with LINQ]");
Regex.Matches(s, "~([^~]*)")
    .OfType<Match>()
    .ToList()
    .ForEach(m => Console.WriteLine("'"+m.Groups[1].Value+"'"))
    ;
}}

The regular expression in #2 matches the delimiter character followed by a match group containing zero or more non-delimiter characters. The resultant matches are the delimited strings (including any empty strings). For each match, "match.Value" is the entire string including leading delimiter and "match.Groups1.Value" is the first match group containing the delimiter free string.

For completeness, the third encoding (#3) is included showing the same regular expression method in #2, but in a LINQ coding style.

If you are struggling with regular expressions, I highly recommend Mastering Regular Expressions, Third Edition by Jeffrey E. F. Friedl. It is, by far, the best aid to understanding regular expressions and later serves as an excellent reference or refresher as needed.

like image 76
rivy Avatar answered Sep 28 '22 15:09

rivy


In C#, this seems to get what you want:

"~Peter~Lois~Chris~Meg~Stewie".Split("~".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
like image 25
Jay Bazuzi Avatar answered Sep 28 '22 15:09

Jay Bazuzi