I am working on an application which imports thousands of lines where every line has a format like this:
|* 9070183020 |04.02.2011 |107222 |M/S SUNNY MEDICOS |GHAZIABAD | 32,768.00 |
I am using the following Regex
to split the lines to the data I need:
Regex lineSplitter = new Regex(@"(?:^\|\*|\|)\s*(.*?)\s+(?=\|)");
string[] columns = lineSplitter.Split(data);
foreach (string c in columns)
Console.Write("[" + c + "] ");
This is giving me the following result:
[] [9070183020] [] [04.02.2011] [] [107222] [] [M/S SUNNY MEDICOS] [] [GHAZIABAD] [] [32,768.00] [|]
Now I have two questions.
1. How do I remove the empty results. I know I can use:
string[] columns = lineSplitter.Split(data).Where(s => !string.IsNullOrEmpty(s)).ToArray();
but is there any built in method to remove the empty results?
2. How can I remove the last pipe?
Thanks for any help.
Regards,
Yogesh.
EDIT:
I think my question was a little misunderstood. It was never about how I can do it. It was only about how can I do it by changing the Regex
in the above code.
I know that I can do it in many ways. I have already done it with the code mentioned above with a Where
clause and with an alternate way which is also (more than two times) faster:
Regex regex = new Regex(@"(^\|\*\s*)|(\s*\|\s*)");
data = regex.Replace(data, "|");
string[] columns = data.Split(new[] { '|' }, StringSplitOptions.RemoveEmptyEntries);
Secondly, as a test case, my system can parse 92k+ such lines in less than 1.5 seconds in the original method and in less than 700 milliseconds in the second method, where I will never find more than a couple of thousand in real cases, so I don't think I need to think about the speed here. In my opinion thinking about speed in this case is Premature optimization.
I have found the answer to my first question: it cannot be done with Split
as there is no such option built in.
Still looking for answer to my second question.
Regex lineSplitter = new Regex(@"[\s*\*]*\|[\s*\*]*");
var columns = lineSplitter.Split(data).Where(s => s != String.Empty);
or you could simply do:
string[] columns = data.Split(new char[] {'|'}, StringSplitOptions.RemoveEmptyEntries);
foreach (string c in columns) this.textBox1.Text += "[" + c.Trim(' ', '*') + "] " + "\r\n";
And no, there is no option to remove empty entries for RegEx.Split as is for String.Split.
You can also use matches.
Don't use a regex at all in your case. It doesn't seem you need one and regexes are much slower (and have a much higher overhead) than directly using the string functions.
So use somewhat like:
const Char[] splitChars = new Char[] {'|'};
string[] splitData = data.Split(splitChars, StringSplitOptions.RemoveEmptyEntries)
I think this may work as an equivalent to remove empty strings:
string[] splitter = Regex.Split(textvalue,@"\s").Where(s => s != String.Empty).ToArray<string>();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With