How can I efficiently extract the key-value pairs from a string into a HashMap
when
key
is always followed by :
and then the valuevalue
ends with a ,
followed by another key
(sometimes whitespace and then key
)value
can contain , :
throughoutvalue
will include any key
key
s are not fixedkey
names are knownFor these key-value pairs
key1:value1, key2:this is, some value2, key3:anothe:r val,ue,
It should produce this HashMap
:
"key1", "value1"
"key2", "this is, some value2"
"key3", "anothe:r val,ue"
I have tried the following code but it is no good with just a ,
as a delimiter as the value can contain commas throughout.
"key1:value1, key2:this is, some value2, key3:anothe:r val,ue,"
.split(",")
.map(|kv| kv.splitn(2, ":").collect::<Vec<&str>>())
.filter(|vec| vec.len() == 2)
.map(|vec| (vec[0].trim().into(), vec[1].trim().into()))
.collect()
My thought would be to provide a list of keys: ["key1", "key2", "key3"]
to use as delimiters
UPDATE:
Using @Lucretiel answer I have come up with:
fn key_value<'a>(keys: &[&str], mut command: &'a str) -> HashMap<&'a str, &'a str> {
let mut hashmap = HashMap::new();
loop {
if let Some(key) = key(&keys, &command) {
command = &command[key.len() + 1..];
let value = value(&keys, &command);
let trim: &[_] = &[',', ' '];
command = &command[value.len()..].trim_start_matches(trim);
hashmap.insert(key, value);
} else {
break;
}
}
hashmap
}
fn key<'a>(keys: &[&str], command: &'a str) -> Option<&'a str> {
let regex = format!("^({}):", keys.join("|"));
let regex = regex::Regex::new(®ex).expect("Invalid regex");
match regex.shortest_match(&command) {
Some(position) => Some(&command[..position - 1]),
None => None,
}
}
fn value<'a>(keys: &[&str], command: &'a str) -> &'a str {
let regex = format!(r#",\s*({}):"#, keys.join("|"));
let regex = regex::Regex::new(®ex).expect("Invalid regex");
match regex.find(&command) {
Some(position) => &command[..position.start()],
None => command,
}
}
(Playground)
The actual code to solve this is nontrivial, but it can be done. There's a lot of little fiddly edge cases, depending on what error cases you want to account for (for instance, do you require that every key in your known key list is present in the input string to parse? Do you allow duplicate keys? etc.). The basic algorithm looks like this:
^{key}:
. This is the current key.
,\s*{key}:
. This is the next key.
There's no way to do this with a conventional grammar; as presented it's very ambiguous. However, if you structure your parsing around scanning for each subsequent key (assuming that keys never appear as substrings in values) you can successfully parse this kind of input.
The algorithm as described runs in quadratic time, but hypothetically it should be reducible to linear time if you create a composite regular expression to search for every key simultaneously:
,\s*(key1|key2|key3|...):
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With