To split the line in Python, use the String split() method. The split() is an inbuilt method that returns a list of lines after breaking the given string by the specified separator. In this tutorial, the line is equal to the string because there is no concept of a line in Python. So you can think of a line as a string.
To split a string by newline, call the split() method passing it the following regular expression as parameter - /\r?\ n/ . The split method will split the string on each occurrence of a newline character and return an array containing the substrings. Copied!
If it looks ugly, just remove the unnecessary ToCharArray call.
If you want to split by either \n or \r, you've got two options:
Use an array literal – but this will give you empty lines for Windows-style line endings \r\n:
var result = text.Split(new [] { '\r', '\n' });
Use a regular expression, as indicated by Bart:
var result = Regex.Split(text, "\r\n|\r|\n");
If you want to preserve empty lines, why do you explicitly tell C# to throw them away? (StringSplitOptions parameter) – use StringSplitOptions.None instead.
using (StringReader sr = new StringReader(text)) {
string line;
while ((line = sr.ReadLine()) != null) {
// do something
}
}
This works great and is faster than Regex:
input.Split(new[] {"\r\n", "\r", "\n"}, StringSplitOptions.None)
It is important to have "\r\n" first in the array so that it's taken as one line break. The above gives the same results as either of these Regex solutions:
Regex.Split(input, "\r\n|\r|\n")
Regex.Split(input, "\r?\n|\r")
Except that Regex turns out to be about 10 times slower. Here's my test:
Action<Action> measure = (Action func) => {
var start = DateTime.Now;
for (int i = 0; i < 100000; i++) {
func();
}
var duration = DateTime.Now - start;
Console.WriteLine(duration);
};
var input = "";
for (int i = 0; i < 100; i++)
{
input += "1 \r2\r\n3\n4\n\r5 \r\n\r\n 6\r7\r 8\r\n";
}
measure(() =>
input.Split(new[] {"\r\n", "\r", "\n"}, StringSplitOptions.None)
);
measure(() =>
Regex.Split(input, "\r\n|\r|\n")
);
measure(() =>
Regex.Split(input, "\r?\n|\r")
);
Output:
00:00:03.8527616
00:00:31.8017726
00:00:32.5557128
and here's the Extension Method:
public static class StringExtensionMethods
{
public static IEnumerable<string> GetLines(this string str, bool removeEmptyLines = false)
{
return str.Split(new[] { "\r\n", "\r", "\n" },
removeEmptyLines ? StringSplitOptions.RemoveEmptyEntries : StringSplitOptions.None);
}
}
Usage:
input.GetLines() // keeps empty lines
input.GetLines(true) // removes empty lines
You could use Regex.Split:
string[] tokens = Regex.Split(input, @"\r?\n|\r");
Edit: added |\r to account for (older) Mac line terminators.
If you want to keep empty lines just remove the StringSplitOptions.
var result = input.Split(System.Environment.NewLine.ToCharArray());
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With