Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the best way to parse this string in C#?

Tags:

I have a string that I am reading from another system. It's basically a long string that represents a list of key value pairs that are separated by a space in between. It looks like this:

 key:value[space]key:value[space]key:value[space] 

So I wrote this code to parse it:

string myString = ReadinString(); string[] tokens = myString.split(' '); foreach (string token in tokens) {      string key = token.split(':')[0];      string value = token.split(':')[1];      .  . . .  } 

The issue now is that some of the values have spaces in them so my "simplistic" split at the top no longer works. I wanted to see how I could still parse out the list of key value pairs (given space as a separator character) now that I know there also could be spaces in the value field as split doesn't seem like it's going to be able to work anymore.

NOTE: I now confirmed that KEYs will NOT have spaces in them so I only have to worry about the values. Apologies for the confusion.

like image 575
leora Avatar asked May 31 '11 10:05

leora


People also ask

How do you parse a string?

String parsing in java can be done by using a wrapper class. Using the Split method, a String can be converted to an array by passing the delimiter to the split method. The split method is one of the methods of the wrapper class. String parsing can also be done through StringTokenizer.

What is parsing a string in C?

The C function strtok() is a string tokenization function that takes two arguments: an initial string to be parsed and a const -qualified character delimiter. It returns a pointer to the first character of a token or to a null pointer if there is no token.

What is parser in C programming?

The C/C++ parser is used for C and C++ language source files. The C/C++ parser uses syntax highlighting to identify language elements, including the following elements: Identifiers. Operators.

What is string in C language?

String is an array of characters and terminated by a null character (\0). The null character is not placed by the user, the compiler places it at the end of string automatically.


2 Answers

Use this regular expression:

\w+:[\w\s]+(?![\w+:]) 

I tested it on

test:testvalue test2:test value test3:testvalue3 

It returns three matches:

test:testvalue test2:test value test3:testvalue3 

You can change \w to any character set that can occur in your input.

Code for testing this:

var regex = new Regex(@"\w+:[\w\s]+(?![\w+:])"); var test = "test:testvalue test2:test value test3:testvalue3";  foreach (Match match in regex.Matches(test)) {     var key = match.Value.Split(':')[0];     var value = match.Value.Split(':')[1];      Console.WriteLine("{0}:{1}", key, value); } Console.ReadLine(); 

As Wonko the Sane pointed out, this regular expression will fail on values with :. If you predict such situation, use \w+:[\w: ]+?(?![\w+:]) as the regular expression. This will still fail when a colon in value is preceded by space though... I'll think about solution to this.

like image 135
Episodex Avatar answered Oct 20 '22 04:10

Episodex


This cannot work without changing your split from a space to something else such as a "|".

Consider this:

Alfred Bester:Alfred Bester Alfred:Alfred Bester

  • Is this Key "Alfred Bester" & value Alfred" or Key "Alfred" & value "Bester Alfred"?
like image 28
Carra Avatar answered Oct 20 '22 06:10

Carra