How can i easily parse a document which has this structure
description
some line of text
another line of text
more lines of text
quality
3 47 88 4 4 4 4
text: type 1
stats some funny stats
description
some line of text2
another line of text2
more lines of text2
quality
1 2 4 6 7
text: type 1
stats some funny stats
.
.
.
Ideally i would want an array of hash structures where each hash represents a 'section' of the document and probably should look like this:
{:description => "some line of text another line of text more lines of text", :quality => "3 47 88 4 4 4 4", :text =>type 1, :stats => "some funny stats"}
You should look for the indicator lines (description, quality, text and stats) in a loop and fill the hash while processing the document line by line.
Another option would be to use regular expressions and parse the document at once, but you don't really need regular expressions here, and if you're not familiar with them, I'd have to recommend against regexes.
UPDATE:
sections = []
File.open("deneme") do |f|
current = {:description => "", :text => "", :quality => "", :stats => ""}
inDescription = false
inQuality = false
f.each_line do |line|
if inDescription
if line.strip == ""
inDescription = false
else
current[:description] += line
end
elsif inQuality
current[:quality] = line.strip
inQuality = false
elsif line.strip == "description"
inDescription = true
elsif line.strip == "quality"
inQuality = true
elsif line.match(/^text: /)
current[:text] = line[6..-1].strip
elsif line.match(/^stats /)
current[:stats] = line[6..-1].strip
sections.push(current)
current = {:description => "", :text => "", :quality => "", :stats => ""}
end
end
end
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With