I have to handle deep nesting of ul
, ol
, and li
tags. I need to give the same view as we are giving in the browser. I want to achieve the following example in a pdf file:
text = "
<body>
<ol>
<li>One</li>
<li>Two
<ol>
<li>Inner One</li>
<li>inner Two
<ul>
<li>hey
<ol>
<li>hiiiiiiiii</li>
<li>why</li>
<li>hiiiiiiiii</li>
</ol>
</li>
<li>aniket </li>
</li>
</ul>
<li>sup </li>
<li>there </li>
</ol>
<li>hey </li>
<li>Three</li>
</li>
</ol>
<ol>
<li>Introduction</li>
<ol>
<li>Introduction</li>
</ol>
<li>Description</li>
<li>Observation</li>
<li>Results</li>
<li>Summary</li>
</ol>
<ul>
<li>Introduction</li>
<li>Description
<ul>
<li>Observation
<ul>
<li>Results
<ul>
<li>Summary</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
<li>Overview</li>
</ul>
</body>"
I have to use prawn for my task. But prawn doesn't support HTML tags. So, I came up with a solution using nokogiri
:. I am parsing and later removing the tags with gsub. The below solution I have written for a part of the above content but the problem is ul and ol can vary.
RULES = {
ol: {
1 => ->(index) { "#{index + 1}. " },
2 => ->(index) { "#{}" },
3 => ->(index) { "#{}" },
4 => ->(index) { "#{}" }
},
ul: {
1 => ->(_) { "\u2022 " },
2 => ->(_) { "" },
3 => ->(_) { "" },
4 => ->(_) { "" },
}
}
def ol_rule(group, deepness: 1)
group.search('> li').each_with_index do |item, i|
prefix = RULES[:ol][deepness].call(i)
item.prepend_child(prefix)
descend(item, deepness + 1)
end
end
def ul_rule(group, deepness: 1)
group.search('> li').each_with_index do |item, i|
prefix = RULES[:ul][deepness].call(i)
item.prepend_child(prefix)
descend(item, deepness + 1)
end
end
def descend(item, deepness)
item.search('> ol').each do |ol|
ol_rule(ol, deepness: deepness)
end
item.search('> ul').each do |ul|
ul_rule(ul, deepness: deepness)
end
end
doc = Nokogiri::HTML.fragment(text)
doc.search('ol').each do |group|
ol_rule(group, deepness: 1)
end
doc.search('ul').each do |group|
ul_rule(group, deepness: 1)
end
puts doc.inner_text
1. One
2. Two
1. Inner One
2. inner Two
• hey
1. hiiiiiiiii
2. why
3. hiiiiiiiii
• aniket
3. sup
4. there
3. hey
4. Three
1. Introduction
1. Introduction
2. Description
3. Observation
4. Results
5. Summary
• Introduction
• Description
• Observation
• Results
• Summary
• Overview
Problem
1) What I want to achieve is how to handle space when working with ul and ol tags
2) How to handle deep nesting when li come inside ul or li come inside ol
I've come up with a solution that handles multiple identations with configurable numeration rules per level:
require 'nokogiri'
ROMANS = %w[i ii iii iv v vi vii viii ix]
RULES = {
ol: {
1 => ->(index) { "#{index + 1}. " },
2 => ->(index) { "#{('a'..'z').to_a[index]}. " },
3 => ->(index) { "#{ROMANS.to_a[index]}. " },
4 => ->(index) { "#{ROMANS.to_a[index].upcase}. " }
},
ul: {
1 => ->(_) { "\u2022 " },
2 => ->(_) { "\u25E6 " },
3 => ->(_) { "* " },
4 => ->(_) { "- " },
}
}
def ol_rule(group, deepness: 1)
group.search('> li').each_with_index do |item, i|
prefix = RULES[:ol][deepness].call(i)
item.prepend_child(prefix)
descend(item, deepness + 1)
end
end
def ul_rule(group, deepness: 1)
group.search('> li').each_with_index do |item, i|
prefix = RULES[:ul][deepness].call(i)
item.prepend_child(prefix)
descend(item, deepness + 1)
end
end
def descend(item, deepness)
item.search('> ol').each do |ol|
ol_rule(ol, deepness: deepness)
end
item.search('> ul').each do |ul|
ul_rule(ul, deepness: deepness)
end
end
doc = Nokogiri::HTML.fragment(text)
doc.search('ol:root').each do |group|
binding.pry
ol_rule(group, deepness: 1)
end
doc.search('ul:root').each do |group|
ul_rule(group, deepness: 1)
end
You can then remove the tags or use doc.inner_text depending on your environment.
Two caveats though:
Current Output:
1. One
2. Two
a. Inner One
b. inner Two
◦ hey
◦ hey
3. hey
4. hey
hey
Three
1. Introduction
a. Introduction
2. Description
3. Observation
4. Results
5. Summary
• Introduction
• Description
◦ Observation
* Results
- Summary
• Overview
Firstly for handling space, I have used a hack in the lambda call. Also, I am using add_previous_sibling function given by nokogiri to append something in starting. Lastly Prawn doesn't handle space when we deal with ul & ol tags so for that I have used this gsub gsub(/^([^\S\r\n]+)/m) { |m| "\xC2\xA0" * m.size }. You can read more from this link
Note: Nokogiri doesn't handle invalid HTML so always provide valid HTML
RULES = {
ol: {
1 => ->(index) { "#{index + 1}. " },
2 => ->(index) { "#{}" },
3 => ->(index) { "#{}" },
4 => ->(index) { "#{}" }
},
ul: {
1 => ->(_) { "\u2022 " },
2 => ->(_) { "" },
3 => ->(_) { "" },
4 => ->(_) { "" },
},
space: {
1 => ->(index) { " " },
2 => ->(index) { " " },
3 => ->(index) { " " },
4 => ->(index) { " " },
}
}
def ol_rule(group, deepness: 1)
group.search('> li').each_with_index do |item, i|
prefix = RULES[:ol][deepness].call(i)
space = RULES[:space][deepness].call(i)
item.add_previous_sibling(space)
item.prepend_child(prefix)
descend(item, deepness + 1)
end
end
def ul_rule(group, deepness: 1)
group.search('> li').each_with_index do |item, i|
space = RULES[:space][deepness].call(i)
prefix = RULES[:ul][deepness].call(i)
item.add_previous_sibling(space)
item.prepend_child(prefix)
descend(item, deepness + 1)
end
end
def descend(item, deepness)
item.search('> ol').each do |ol|
ol_rule(ol, deepness: deepness)
end
item.search('> ul').each do |ul|
ul_rule(ul, deepness: deepness)
end
end
doc = Nokogiri::HTML.parse(text)
doc.search('ol').each do |group|
ol_rule(group, deepness: 1)
end
doc.search('ul').each do |group|
ul_rule(group, deepness: 1)
end
Prawn::Document.generate("hello.pdf") do
#puts doc.inner_text
text doc.at('body').children.to_html.gsub(/^([^\S\r\n]+)/m) { |m| "\xC2\xA0" * m.size }.gsub("<ul>","").gsub("<\/ul>","").gsub("<ol>","").gsub("<\/ol>","").gsub("<li>", "").gsub("</li>","").gsub("\\n","").gsub(/[\n]+/, "\n")
end
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With