Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to avoid Nokogiri encoding href content?

Tags:

ruby

nokogiri

I have this code:

n = Nokogiri::HTML::DocumentFragment.parse("<a href='{{var_name}}'>click</a>")

and when I do n.to_html, I get the {{ }} escaped:

"<a href=\"%7B%7Bvar_name%7D%7D\">click</a>"

I want to avoid that, because I need to parse it with a template engine.

How can I tell Nokogiri not to encode the "href" content?

like image 658
Juanjo Conti Avatar asked Oct 19 '25 11:10

Juanjo Conti


1 Answers

I don't think it's possible to tell Nokogiri to not encode text values inside parameters in HTML. It's a parser following rules, but that doesn't mean we have to accept its output:

require 'nokogiri'

REGEX_HASH = {
  '%7B' => '{',
  '%7D' => '}'
}

REGEX = /(?:#{ Regexp.union(REGEX_HASH.keys).source })/
# => /(?:%7B|%7D)/

doc = Nokogiri::HTML::DocumentFragment.parse("<a href='{{var_name}}'>click</a>")
doc.to_html
# => "<a href=\"%7B%7Bvar_name%7D%7D\">click</a>"

fixed_html = doc.to_html.gsub(REGEX, REGEX_HASH)
# => "<a href=\"{{var_name}}\">click</a>"

But, if XHTML or XML output is acceptable, you can simplify things greatly:

doc = Nokogiri::HTML::DocumentFragment.parse("<a href='{{var_name}}'>click</a>")
doc.to_html  # => "<a href=\"%7B%7Bvar_name%7D%7D\">click</a>"
doc.to_xhtml # => "<a href=\"{{var_name}}\">click</a>"
doc.to_xml   # => "<a href=\"{{var_name}}\">click</a>"
like image 147
the Tin Man Avatar answered Oct 21 '25 06:10

the Tin Man



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!