I have about thirty thousand records with a string column that has been stored in the following format, with different keys:
"something: this, this and that, that, other stuff, another: name, another name, last: here"
In rails, I want to change it into a hash like
{
something: [ "this", "this and that", "that" ],
another: [ "name", "another name" ],
last: [ "here" ]
}
Is there a way to do this elegantly? I was thinking of splitting at the colon, then doing a reverse search of the first space.
There are about a hundred ways to solve this. A pretty straightforward one is this:
str = "something: this, this and that, that, other stuff, another: name, another name, last: here"
key = nil
str.scan(/\s*([^,:]+)(:)?\s*/).each_with_object({}) do |(val, colon), hsh|
if colon
key = val.to_sym
hsh[key] = []
else
hsh[key] << val
end
end
# => {
# something: ["this", "this and that", "that", "other stuff"],
# another: ["name", "another name"],
# last: ["here"]
# }
It works by scanning the string with the following regular expression:
/
\s* # any amount of optional whitespace
([^,:]+) # one or more characters that aren't , or : (capture 1)
(:)? # an optional trailing : (capture 2)
\s* # any amount of optional whitespace
/x
Then it iterates over the matches and puts them into a hash. When a match has a trailing colon (capture 2), a new hash key is created with an empty array for a value. Otherwise the value (capture 1) is added to the array for the most recent key.
A somewhat less straightforward but cleverer approach is to let the RegExp do more work:
MATCH_LIST_ENTRY = /([^:]+):\s*((?:[^,]+(?:,\s*|$))+?)(?=[^:,]+:|$)/
def parse_list2(str)
str.scan(MATCH_LIST_ENTRY).map do |k, vs|
[k.to_sym, vs.split(/,\s*/)]
end.to_h
end
I won't pick apart the RegExp for this one, but it's simpler than it looks. Regexper does a pretty good job of explaining it.
You can see both of these in action on repl.it here: https://repl.it/@jrunning/LongtermMidnightblueAssembler
If str is the string given in the example, the desired hash can be constructed as follows.
str.split(/, *(?=\p{L}+:)/).
each_with_object({}) do |s,h|
k, v = s.split(/: +/)
h[k.to_sym]= v.split(/, */)
end
#=> {:something=>["this", "this and that", "that", "other stuff"],
# :another=>["name", "another name"],
# :last=>["here"]}
Note:
str.split(/, *(?=\p{L}+:)/)
#=> ["something: this, this and that, that, other stuff",
# "another: name, another name",
# "last: here"]
This regular expression reads, "match a comma followed by zero or more spaces, the match to be immediately followed by one or more Unicode letters followed by a colon, (?=\p{L}+:) being a positive lookahead".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With