How to get the desired formatted hash from a string in an efficient way with Ruby?





I want to hash in a particular format when a string appears in the below format:

Given string:

str = 'A
A = B
A = B = C
A = B = D
A = E = F
G = H
G = I
G = J'

# Into a hash like this (required hash pattern):

  "A" => {
    "B" => {
      "C" => nil,
      "D" => nil
    "E" => {
      "F" => nil
  "G" => {
     "H" => nil,
     "I" => nil,
     "J" => nil

I tried many ways, but this is the closest:

output = Hash.new
line_hash = Hash.new
str.each_line do |line|
  arr = line.split("=")
  e = arr.first.strip
  line_hash[e] = {}
  arr.each_with_index do |ele, i|
    break unless arr[i+1]
    line_hash[ele.strip] = arr[i+1] unless output.keys.include?(ele.strip)
  output[e] = line_hash unless output.keys.include?(e)
3 Answers

str = "A\nA = B\nA = B = C\nA = B = D\nA = E = F\nG = H\nG = I\nG = J"

curr = h = {}

str.each_line { |l| 
  l.chomp.split(/\s*=\s*/m).each { |c| 
    curr = curr[c] ||= {}; 
  curr = h 

puts h
# => {
#  "A" => {
#    "B" => {
#      "C" => {},
#      "D" => {}
#    },
#    "E" => {
#      "F" => {}
#    }
#  },
#  "G" => {
#    "H" => {},
#    "I" => {},
#    "J" => {}
#  }
# }

I hope you’ll excuse me for leaving empty hashes instead of null values at leaves for sake of solution clarity.

To nullify leaves:

def leaves_nil! hash
   hash.each { |k,v| v.empty? ? hash[k] = nil : leaves_nil!(hash[k]) }  
You can also get that output by something like this

str = 'A
A = B
A = B = C
A = B = D
A = E = F
G = H
G = I
G = J'

curr = h = {}
lines = str.split("\n").map{|t| t.split(/\s*=\s*/m) }
lines.each do |line| 
  line.each { |c| curr = curr[c.strip] = curr[c.strip] ||  ((line.last == c) ? nil : {});  }
  curr = h


#=>  {
#        "A" => {
#            "B" => {
#                "C" => nil,
#                "D" => nil
#            }, "E" => {
#                "F" => nil
#            }
#        }, "G" => {
#            "H" => nil,
#            "I" => nil,
#            "J" => nil
#        }
#    }
This is another way that requires less data to build the hash. If, for example, the line

A = B = C = D

is present, there is no need for either of the following:

A = B
A = B = C

and the order of the lines is unimportant.


def hashify(str)
  str.lines.each_with_object({}) { |line, h|
    line.split(/\s*=\s*/).reduce(h) { |g,w|
      (w[-1] == "\n") ? g[w.chomp] = nil : g[w] ||= {} } }


str =<<_
A = B = C
G = I
A = B = D
A = E = F
G = H
A = K
G = J

  #=> {"A"=>{"B"=>{"C"=>nil, "D"=>nil}, "E"=>{"F"=>nil}, "K"=>nil},
  #    "G"=>{"I"=>nil, "H"=>nil, "J"=>nil}} 


For str above:

a = str.lines
  #=> ["A = B = C\n", "A = B = D\n", "A = E = F\n",
  #    "G = H\n", "G = I\n", "G = J\n"]

Notice that String#lines, unlike split(/'\n'/), keeps the newline characters. Keeping them at this point was intentional; they serve an important purpose, as will be shown below.

enum = a.each_with_object({})
  #=> #<Enumerator: ["A = B = C\n", "A = B = D\n", "A = E = F\n", "G = H\n",
  #                  "G = I\n", "G = J\n"]:each_with_object({})>

We can convert the enumerator to an array to see the elements the Array#each will pass to the block:

  #=> [["A = B = C\n", {}], ["A = B = D\n", {}], ["A = E = F\n", {}],
  #    ["G = H\n", {}], ["G = I\n", {}], ["G = J\n", {}]]

enum now invokes each to pass each element into the block:

enum.each { |line, h| line.split(/\s*=\s*/).reduce(h) { |g,w|
            (w[-1] == '\n') ? g[w.chomp] = nil : g[w] ||= {} } }
  #=> {"A"=>{"B"=>{"C\n"=>{}, "D\n"=>{}}, "E"=>{"F\n"=>{}}},
  #    "G"=>{"H\n"=>{}, "I\n"=>{}, "J\n"=>{}}}

The first value that Array#each passes into the block is:

["A = B = C\n", {}]

which is decomposed or "disambiguated" into it's two elements and assigned to the block variables:

line = "A = B = C\n"
h = {}

We now execute the code in the block:

b = line.split(/\s*=\s*/)
  #=> ["A", "B", "C\n"]
b.reduce(h) { |g,w|
  (w[-1] == '\n') ? g[w.chomp] = nil : g[w] ||= {} }
  #=> {}

The initial value for reduce is the hash h that we are building, which is initially empty. When h and"A" are passed into the block,

g = h #=> {}
w = "A"

so (noting that double quotes are needed for "\n")

w[-1] == "\n"
  #=> "A" == '\n'
  #=> false

so we execute

g[w] ||= {}
  #=> g['A'] ||= {} 
  #=> g['A'] = g['A'] || {}
  #=> g['A'] = nil || {}
  #=> {}

so now

h #=> {"A"=>{}}

g[w] => {} is then passed back back to reduce and the block variables for the second element passed to the block are:

g = g["A"] #=> {}
w = "B"


w[-1] == "\n" #=> false

we again execute

g[w] ||= {}
 #=> g["B"] ||=> {} => {}

and now

h #=> {"A"=>{"B"=>{}}}

Lastly, [g["B"], "C\n"] is passed into the block, decomposed and assigned to the block variables:

g = g["B"] #=> {}
w = "C\n"

but the presence of the newline character in w results in

w[-1] == "\n" #=> true

telling us it is the last word in the line, so we need to strip off the newline character and set the value to nil:

g[w.chomp] = nil
  #=> g["C"] = nil 

resulting in:

h #=> {"A"=>{"B"=>{"C"=>nil}}}

Leaving the newline character in the string provided the needed "flag" for processing the last word on each line differently than the others.

The other lines are processed similarly.

