Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to unescape HTML in Nokogiri Ruby, so & remains & and not &

Tags:

ruby

nokogiri

I have a title doc.at('head/title').inner_html that comes out & and it should be &.

My original document is:

<head><title>Foo & Bar</title></head>

but in comes out as the following:

>> doc = Nokogiri::HTML.parse(file, nil, "UTF-8")
>> doc.at('head/title')
=> #<Nokogiri::XML::Element:0x..fdb851bea name="title" children=#<Nokogiri::XML::Text:0x..fdb850808 "Foo & Bar">>
>> doc.at('head/title').inner_html
=> "Foo &amp; Bar"

I don't want to use Iconv or CGI like:

>> require 'cgi'
>> CGI.unescapeHTML(doc.at('head/title').inner_html)
=> "Foo & Bar"

that is ugly and inconvenient.

like image 371
pgericson Avatar asked Dec 31 '09 13:12

pgericson


1 Answers

Use content instead of inner_html to get the content as plain text instead of (X)HTML.

irb(main):011:0> doc.at('head/title').content
=> "Foo & Bar"
like image 125
Ben James Avatar answered Nov 07 '22 06:11

Ben James