I'm trying to parse an HTML fragment that contains a custom HTML tag using Nokogiri.
Example:
string = "<div>hello</div>\n<custom-tag></custom-tag>"
I tried to load it in many ways, but none is optimal.
If I use Nokogiri::HTML:
doc = Nokogiri::HTML(string)
When I use to_html
, it adds a doctype
and an html
tag that wraps the content. It's undesired.
If I use Nokogiri::XML:
doc = Nokogiri::XML(string)
I got Error at line 2: Extra content at the end of the document
, since in XML there must be a root tag that wraps all the document content. If I try to save this content again, The output is <div>hello</div>
(every tag after the first is removed)
I tried also doc = Nokogiri::HTML.fragment
:
doc = Nokogiri::HTML.fragment(string)
But it complains about the custom-tag
.
How can I make Nokogiri parse correctly with this HTML fragment?
doc = Nokogiri::HTML.fragment(string)
is the way to go, you can ignore doc.errors
complaining about the invalid tag.
You are giving it invalid HTML, so you can't expect it to not report errors, but HTML parsers tend to be forgiving.
You can also use Nokogiri::XML.fragment
, if you're sure the rest of it is well-formed. That won't give you errors about undefined tags.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With