To split the line in Python, use the String split() method. The split() is an inbuilt method that returns a list of lines after breaking the given string by the specified separator. In this tutorial, the line is equal to the string because there is no concept of a line in Python. So you can think of a line as a string.
To split a string by newline, call the split() method passing it the following regular expression as parameter - /\r?\ n/ . The split method will split the string on each occurrence of a newline character and return an array containing the substrings. Copied!
If it looks ugly, just remove the unnecessary ToCharArray
call.
If you want to split by either \n
or \r
, you've got two options:
Use an array literal – but this will give you empty lines for Windows-style line endings \r\n
:
var result = text.Split(new [] { '\r', '\n' });
Use a regular expression, as indicated by Bart:
var result = Regex.Split(text, "\r\n|\r|\n");
If you want to preserve empty lines, why do you explicitly tell C# to throw them away? (StringSplitOptions
parameter) – use StringSplitOptions.None
instead.
using (StringReader sr = new StringReader(text)) {
string line;
while ((line = sr.ReadLine()) != null) {
// do something
}
}
This works great and is faster than Regex:
input.Split(new[] {"\r\n", "\r", "\n"}, StringSplitOptions.None)
It is important to have "\r\n"
first in the array so that it's taken as one line break. The above gives the same results as either of these Regex solutions:
Regex.Split(input, "\r\n|\r|\n")
Regex.Split(input, "\r?\n|\r")
Except that Regex turns out to be about 10 times slower. Here's my test:
Action<Action> measure = (Action func) => {
var start = DateTime.Now;
for (int i = 0; i < 100000; i++) {
func();
}
var duration = DateTime.Now - start;
Console.WriteLine(duration);
};
var input = "";
for (int i = 0; i < 100; i++)
{
input += "1 \r2\r\n3\n4\n\r5 \r\n\r\n 6\r7\r 8\r\n";
}
measure(() =>
input.Split(new[] {"\r\n", "\r", "\n"}, StringSplitOptions.None)
);
measure(() =>
Regex.Split(input, "\r\n|\r|\n")
);
measure(() =>
Regex.Split(input, "\r?\n|\r")
);
Output:
00:00:03.8527616
00:00:31.8017726
00:00:32.5557128
and here's the Extension Method:
public static class StringExtensionMethods
{
public static IEnumerable<string> GetLines(this string str, bool removeEmptyLines = false)
{
return str.Split(new[] { "\r\n", "\r", "\n" },
removeEmptyLines ? StringSplitOptions.RemoveEmptyEntries : StringSplitOptions.None);
}
}
Usage:
input.GetLines() // keeps empty lines
input.GetLines(true) // removes empty lines
You could use Regex.Split:
string[] tokens = Regex.Split(input, @"\r?\n|\r");
Edit: added |\r
to account for (older) Mac line terminators.
If you want to keep empty lines just remove the StringSplitOptions.
var result = input.Split(System.Environment.NewLine.ToCharArray());
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With