I have a string that I am reading from another system. It's basically a long string that represents a list of key value pairs that are separated by a space in between. It looks like this:
key:value[space]key:value[space]key:value[space]
So I wrote this code to parse it:
string myString = ReadinString(); string[] tokens = myString.split(' '); foreach (string token in tokens) { string key = token.split(':')[0]; string value = token.split(':')[1]; . . . . }
The issue now is that some of the values have spaces in them so my "simplistic" split at the top no longer works. I wanted to see how I could still parse out the list of key value pairs (given space as a separator character) now that I know there also could be spaces in the value field as split doesn't seem like it's going to be able to work anymore.
NOTE: I now confirmed that KEYs will NOT have spaces in them so I only have to worry about the values. Apologies for the confusion.
String parsing in java can be done by using a wrapper class. Using the Split method, a String can be converted to an array by passing the delimiter to the split method. The split method is one of the methods of the wrapper class. String parsing can also be done through StringTokenizer.
The C function strtok() is a string tokenization function that takes two arguments: an initial string to be parsed and a const -qualified character delimiter. It returns a pointer to the first character of a token or to a null pointer if there is no token.
The C/C++ parser is used for C and C++ language source files. The C/C++ parser uses syntax highlighting to identify language elements, including the following elements: Identifiers. Operators.
String is an array of characters and terminated by a null character (\0). The null character is not placed by the user, the compiler places it at the end of string automatically.
Use this regular expression:
\w+:[\w\s]+(?![\w+:])
I tested it on
test:testvalue test2:test value test3:testvalue3
It returns three matches:
test:testvalue test2:test value test3:testvalue3
You can change \w
to any character set that can occur in your input.
Code for testing this:
var regex = new Regex(@"\w+:[\w\s]+(?![\w+:])"); var test = "test:testvalue test2:test value test3:testvalue3"; foreach (Match match in regex.Matches(test)) { var key = match.Value.Split(':')[0]; var value = match.Value.Split(':')[1]; Console.WriteLine("{0}:{1}", key, value); } Console.ReadLine();
As Wonko the Sane pointed out, this regular expression will fail on values with :
. If you predict such situation, use \w+:[\w: ]+?(?![\w+:])
as the regular expression. This will still fail when a colon in value
is preceded by space though... I'll think about solution to this.
This cannot work without changing your split from a space to something else such as a "|".
Consider this:
Alfred Bester:Alfred Bester Alfred:Alfred Bester
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With