Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex.Split() sentence to words while preserving whitespace

Tags:

c#

regex

split

I'm using Regex.Split() to take the user input and turn it into individual words in a list but at the moment it removes any spaces they add, I would like it to keep the whitespace.

string[] newInput = Regex.Split(updatedLine, @"\s+");
like image 365
Joel Avatar asked Nov 20 '11 20:11

Joel


1 Answers

string text = "This            is some text";
var splits = Regex.Split(text, @"(?=(?<=[^\s])\s+)");

foreach (string item  in splits)
    Console.Write(item);
Console.WriteLine(splits.Count());

This will give you 4 splits each having all the leading spaces preserved.

(?=\s+)

Means split from the point where there are spaces ahead. But if you use this alone it will create 15 splits on the sample text because every space is followed by another space in case of repeated spaces.

(?=(?<=[^\s])\s+)

This means split from a point which has non space character before it and it has spaces ahead of it.

If the text starts from a space and you want that to be captured in first split with no text then you can modify the expression to following

(?=(?<=^|[^\s])\s+)

Which means series of spaces need to have a non space character before it OR start of the string.

like image 97
Muhammad Hasan Khan Avatar answered Sep 25 '22 09:09

Muhammad Hasan Khan