I have a string
in the following format.
string instance = "{112,This is the first day 23/12/2009},{132,This is the second day 24/12/2009}"
private void parsestring(string input)
{
string[] tokens = input.Split(','); // I thought this would split on the , seperating the {}
foreach (string item in tokens) // but that doesn't seem to be what it is doing
{
Console.WriteLine(item);
}
}
My desired output should be something like this below:
112,This is the first day 23/12/2009
132,This is the second day 24/12/2009
But currently, I get the one below:
{112
This is the first day 23/12/2009
{132
This is the second day 24/12/2009
I am very new to C# and any help would be appreciated.
Don't fixate on Split() being the solution! This is a simple thing to parse without it. Regex answers are probably also OK, but I imagine in terms of raw efficiency making "a parser" would do the trick.
IEnumerable<string> Parse(string input)
{
var results = new List<string>();
int startIndex = 0;
int currentIndex = 0;
while (currentIndex < input.Length)
{
var currentChar = input[currentIndex];
if (currentChar == '{')
{
startIndex = currentIndex + 1;
}
else if (currentChar == '}')
{
int endIndex = currentIndex - 1;
int length = endIndex - startIndex + 1;
results.Add(input.Substring(startIndex, length));
}
currentIndex++;
}
return results;
}
So it's not short on lines. It iterates once, and only performs one allocation per "result". With a little tweaking I could probably make a C#8 version with Index types that cuts on allocations? This is probably good enough.
You could spend a whole day figuring out how to understand the regex, but this is as simple as it comes:
{
, note the next character is the start of a result.}
, consider everything from the last noted "start" until the index before this character as "a result".This won't catch mismatched brackets and could throw exceptions for strings like "}}{". You didn't ask for handling those cases, but it's not too hard to improve this logic to catch it and scream about it or recover.
For example, you could reset startIndex
to something like -1 when }
is found. From there, you can deduce if you find {
when startIndex != -1 you've found "{{". And you can deduce if you find }
when startIndex == -1, you've found "}}". And if you exit the loop with startIndex < -1, that's an opening {
with no closing }
. that leaves the string "}whoops" as an uncovered case, but it could be handled by initializing startIndex
to, say, -2 and checking for that specifically. Do that with a regex, and you'll have a headache.
The main reason I suggest this is you said "efficiently". icepickle's solution is nice, but Split()
makes one allocation per token, then you perform allocations for each TrimX()
call. That's not "efficient". That's "n + 2 allocations".
Use Regex
for this:
string[] tokens = Regex.Split(input, @"}\s*,\s*{")
.Select(i => i.Replace("{", "").Replace("}", ""))
.ToArray();
Pattern explanation:
\s*
- match zero or more white space characters
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With