I want to hash in a particular format when a string appears in the below format:
Given string:
str = 'A
A = B
A = B = C
A = B = D
A = E = F
G = H
G = I
G = J'
# Into a hash like this (required hash pattern):
{
"A" => {
"B" => {
"C" => nil,
"D" => nil
},
"E" => {
"F" => nil
},
},
"G" => {
"H" => nil,
"I" => nil,
"J" => nil
}
}
I tried many ways, but this is the closest:
output = Hash.new
line_hash = Hash.new
str.each_line do |line|
arr = line.split("=")
e = arr.first.strip
line_hash[e] = {}
arr.each_with_index do |ele, i|
break unless arr[i+1]
line_hash[ele.strip] = arr[i+1] unless output.keys.include?(ele.strip)
end
output[e] = line_hash unless output.keys.include?(e)
end
str = "A\nA = B\nA = B = C\nA = B = D\nA = E = F\nG = H\nG = I\nG = J"
curr = h = {}
str.each_line { |l|
l.chomp.split(/\s*=\s*/m).each { |c|
curr = curr[c] ||= {};
}
curr = h
}
puts h
# => {
# "A" => {
# "B" => {
# "C" => {},
# "D" => {}
# },
# "E" => {
# "F" => {}
# }
# },
# "G" => {
# "H" => {},
# "I" => {},
# "J" => {}
# }
# }
I hope you’ll excuse me for leaving empty hashes instead of null values at leaves for sake of solution clarity.
To nullify leaves:
def leaves_nil! hash
hash.each { |k,v| v.empty? ? hash[k] = nil : leaves_nil!(hash[k]) }
end
You can also get that output by something like this
str = 'A
A = B
A = B = C
A = B = D
A = E = F
G = H
G = I
G = J'
curr = h = {}
lines = str.split("\n").map{|t| t.split(/\s*=\s*/m) }
lines.each do |line|
line.each { |c| curr = curr[c.strip] = curr[c.strip] || ((line.last == c) ? nil : {}); }
curr = h
end
output
#=> {
# "A" => {
# "B" => {
# "C" => nil,
# "D" => nil
# }, "E" => {
# "F" => nil
# }
# }, "G" => {
# "H" => nil,
# "I" => nil,
# "J" => nil
# }
# }
This is another way that requires less data to build the hash. If, for example, the line
A = B = C = D
is present, there is no need for either of the following:
A = B
A = B = C
and the order of the lines is unimportant.
Code
def hashify(str)
str.lines.each_with_object({}) { |line, h|
line.split(/\s*=\s*/).reduce(h) { |g,w|
(w[-1] == "\n") ? g[w.chomp] = nil : g[w] ||= {} } }
end
Example
str =<<_
A = B = C
G = I
A = B = D
A = E = F
G = H
A = K
G = J
_
hashify(str)
#=> {"A"=>{"B"=>{"C"=>nil, "D"=>nil}, "E"=>{"F"=>nil}, "K"=>nil},
# "G"=>{"I"=>nil, "H"=>nil, "J"=>nil}}
Explanation
For str
above:
a = str.lines
#=> ["A = B = C\n", "A = B = D\n", "A = E = F\n",
# "G = H\n", "G = I\n", "G = J\n"]
Notice that String#lines, unlike split(/'\n'/)
, keeps the newline characters. Keeping them at this point was intentional; they serve an important purpose, as will be shown below.
enum = a.each_with_object({})
#=> #<Enumerator: ["A = B = C\n", "A = B = D\n", "A = E = F\n", "G = H\n",
# "G = I\n", "G = J\n"]:each_with_object({})>
We can convert the enumerator to an array to see the elements the Array#each will pass to the block:
enum.to_a
#=> [["A = B = C\n", {}], ["A = B = D\n", {}], ["A = E = F\n", {}],
# ["G = H\n", {}], ["G = I\n", {}], ["G = J\n", {}]]
enum
now invokes each
to pass each element into the block:
enum.each { |line, h| line.split(/\s*=\s*/).reduce(h) { |g,w|
(w[-1] == '\n') ? g[w.chomp] = nil : g[w] ||= {} } }
#=> {"A"=>{"B"=>{"C\n"=>{}, "D\n"=>{}}, "E"=>{"F\n"=>{}}},
# "G"=>{"H\n"=>{}, "I\n"=>{}, "J\n"=>{}}}
The first value that Array#each
passes into the block is:
["A = B = C\n", {}]
which is decomposed or "disambiguated" into it's two elements and assigned to the block variables:
line = "A = B = C\n"
h = {}
We now execute the code in the block:
b = line.split(/\s*=\s*/)
#=> ["A", "B", "C\n"]
b.reduce(h) { |g,w|
(w[-1] == '\n') ? g[w.chomp] = nil : g[w] ||= {} }
#=> {}
The initial value for reduce
is the hash h
that we are building, which is initially empty. When h
and"A"
are passed into the block,
g = h #=> {}
w = "A"
so (noting that double quotes are needed for "\n"
)
w[-1] == "\n"
#=> "A" == '\n'
#=> false
so we execute
g[w] ||= {}
#=> g['A'] ||= {}
#=> g['A'] = g['A'] || {}
#=> g['A'] = nil || {}
#=> {}
so now
h #=> {"A"=>{}}
g[w] => {}
is then passed back back to reduce
and the block variables for the second element passed to the block are:
g = g["A"] #=> {}
w = "B"
Since
w[-1] == "\n" #=> false
we again execute
g[w] ||= {}
#=> g["B"] ||=> {} => {}
and now
h #=> {"A"=>{"B"=>{}}}
Lastly, [g["B"], "C\n"]
is passed into the block, decomposed and assigned to the block variables:
g = g["B"] #=> {}
w = "C\n"
but the presence of the newline character in w
results in
w[-1] == "\n" #=> true
telling us it is the last word in the line, so we need to strip off the newline character and set the value to nil
:
g[w.chomp] = nil
#=> g["C"] = nil
resulting in:
h #=> {"A"=>{"B"=>{"C"=>nil}}}
Leaving the newline character in the string provided the needed "flag" for processing the last word on each line differently than the others.
The other lines are processed similarly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With