Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the desired formatted hash from a string in an efficient way with Ruby?

Tags:

string

ruby

hash

I want to hash in a particular format when a string appears in the below format:

Given string:

str = 'A
A = B
A = B = C
A = B = D
A = E = F
G = H
G = I
G = J'

# Into a hash like this (required hash pattern):

{
  "A" => {
    "B" => {
      "C" => nil,
      "D" => nil
    },
    "E" => {
      "F" => nil
    },
  },
  "G" => {
     "H" => nil,
     "I" => nil,
     "J" => nil
  }
}

I tried many ways, but this is the closest:

output = Hash.new
line_hash = Hash.new
str.each_line do |line|
  arr = line.split("=")
  e = arr.first.strip
  line_hash[e] = {}
  arr.each_with_index do |ele, i|
    break unless arr[i+1]
    line_hash[ele.strip] = arr[i+1] unless output.keys.include?(ele.strip)
  end
  output[e] = line_hash unless output.keys.include?(e)
end
like image 411
Sourabh Upadhyay Avatar asked Nov 21 '14 13:11

Sourabh Upadhyay


3 Answers

str = "A\nA = B\nA = B = C\nA = B = D\nA = E = F\nG = H\nG = I\nG = J"

curr = h = {}

str.each_line { |l| 
  l.chomp.split(/\s*=\s*/m).each { |c| 
    curr = curr[c] ||= {}; 
  }
  curr = h 
}

puts h
# => {
#  "A" => {
#    "B" => {
#      "C" => {},
#      "D" => {}
#    },
#    "E" => {
#      "F" => {}
#    }
#  },
#  "G" => {
#    "H" => {},
#    "I" => {},
#    "J" => {}
#  }
# }

I hope you’ll excuse me for leaving empty hashes instead of null values at leaves for sake of solution clarity.

To nullify leaves:

def leaves_nil! hash
   hash.each { |k,v| v.empty? ? hash[k] = nil : leaves_nil!(hash[k]) }  
end
like image 151
Aleksei Matiushkin Avatar answered Nov 19 '22 01:11

Aleksei Matiushkin


You can also get that output by something like this

str = 'A
A = B
A = B = C
A = B = D
A = E = F
G = H
G = I
G = J'

curr = h = {}
lines = str.split("\n").map{|t| t.split(/\s*=\s*/m) }
lines.each do |line| 
  line.each { |c| curr = curr[c.strip] = curr[c.strip] ||  ((line.last == c) ? nil : {});  }
  curr = h
end

output

#=>  {
#        "A" => {
#            "B" => {
#                "C" => nil,
#                "D" => nil
#            }, "E" => {
#                "F" => nil
#            }
#        }, "G" => {
#            "H" => nil,
#            "I" => nil,
#            "J" => nil
#        }
#    }
like image 23
Yogesh dwivedi Geitpl Avatar answered Nov 19 '22 03:11

Yogesh dwivedi Geitpl


This is another way that requires less data to build the hash. If, for example, the line

A = B = C = D

is present, there is no need for either of the following:

A = B
A = B = C

and the order of the lines is unimportant.

Code

def hashify(str)
  str.lines.each_with_object({}) { |line, h|
    line.split(/\s*=\s*/).reduce(h) { |g,w|
      (w[-1] == "\n") ? g[w.chomp] = nil : g[w] ||= {} } }
end

Example

str =<<_
A = B = C
G = I
A = B = D
A = E = F
G = H
A = K
G = J
_

hashify(str)
  #=> {"A"=>{"B"=>{"C"=>nil, "D"=>nil}, "E"=>{"F"=>nil}, "K"=>nil},
  #    "G"=>{"I"=>nil, "H"=>nil, "J"=>nil}} 

Explanation

For str above:

a = str.lines
  #=> ["A = B = C\n", "A = B = D\n", "A = E = F\n",
  #    "G = H\n", "G = I\n", "G = J\n"]

Notice that String#lines, unlike split(/'\n'/), keeps the newline characters. Keeping them at this point was intentional; they serve an important purpose, as will be shown below.

enum = a.each_with_object({})
  #=> #<Enumerator: ["A = B = C\n", "A = B = D\n", "A = E = F\n", "G = H\n",
  #                  "G = I\n", "G = J\n"]:each_with_object({})>

We can convert the enumerator to an array to see the elements the Array#each will pass to the block:

enum.to_a
  #=> [["A = B = C\n", {}], ["A = B = D\n", {}], ["A = E = F\n", {}],
  #    ["G = H\n", {}], ["G = I\n", {}], ["G = J\n", {}]]

enum now invokes each to pass each element into the block:

enum.each { |line, h| line.split(/\s*=\s*/).reduce(h) { |g,w|
            (w[-1] == '\n') ? g[w.chomp] = nil : g[w] ||= {} } }
  #=> {"A"=>{"B"=>{"C\n"=>{}, "D\n"=>{}}, "E"=>{"F\n"=>{}}},
  #    "G"=>{"H\n"=>{}, "I\n"=>{}, "J\n"=>{}}}

The first value that Array#each passes into the block is:

["A = B = C\n", {}]

which is decomposed or "disambiguated" into it's two elements and assigned to the block variables:

line = "A = B = C\n"
h = {}

We now execute the code in the block:

b = line.split(/\s*=\s*/)
  #=> ["A", "B", "C\n"]
b.reduce(h) { |g,w|
  (w[-1] == '\n') ? g[w.chomp] = nil : g[w] ||= {} }
  #=> {}

The initial value for reduce is the hash h that we are building, which is initially empty. When h and"A" are passed into the block,

g = h #=> {}
w = "A"

so (noting that double quotes are needed for "\n")

w[-1] == "\n"
  #=> "A" == '\n'
  #=> false

so we execute

g[w] ||= {}
  #=> g['A'] ||= {} 
  #=> g['A'] = g['A'] || {}
  #=> g['A'] = nil || {}
  #=> {}

so now

h #=> {"A"=>{}}

g[w] => {} is then passed back back to reduce and the block variables for the second element passed to the block are:

g = g["A"] #=> {}
w = "B"

Since

w[-1] == "\n" #=> false

we again execute

g[w] ||= {}
 #=> g["B"] ||=> {} => {}

and now

h #=> {"A"=>{"B"=>{}}}

Lastly, [g["B"], "C\n"] is passed into the block, decomposed and assigned to the block variables:

g = g["B"] #=> {}
w = "C\n"

but the presence of the newline character in w results in

w[-1] == "\n" #=> true

telling us it is the last word in the line, so we need to strip off the newline character and set the value to nil:

g[w.chomp] = nil
  #=> g["C"] = nil 

resulting in:

h #=> {"A"=>{"B"=>{"C"=>nil}}}

Leaving the newline character in the string provided the needed "flag" for processing the last word on each line differently than the others.

The other lines are processed similarly.

like image 38
Cary Swoveland Avatar answered Nov 19 '22 03:11

Cary Swoveland