Parse an HTML fragment whitelisting some custom tags

Question

I'm trying to parse an HTML fragment that contains a custom HTML tag using Nokogiri.

Example:

string = "<div>hello</div>
<custom-tag></custom-tag>"

I tried to load it in many ways, but none is optimal.

If I use Nokogiri::HTML:

doc = Nokogiri::HTML(string)

When I use to_html, it adds a doctype and an html tag that wraps the content. It's undesired.

If I use Nokogiri::XML:

doc = Nokogiri::XML(string)

I got Error at line 2: Extra content at the end of the document, since in XML there must be a root tag that wraps all the document content. If I try to save this content again, The output is <div>hello</div> (every tag after the first is removed)

I tried also doc = Nokogiri::HTML.fragment:

doc = Nokogiri::HTML.fragment(string)

But it complains about the custom-tag.

How can I make Nokogiri parse correctly with this HTML fragment?

Dmitri · Accepted Answer

doc = Nokogiri::HTML.fragment(string) is the way to go, you can ignore doc.errors complaining about the invalid tag.

You are giving it invalid HTML, so you can't expect it to not report errors, but HTML parsers tend to be forgiving.

You can also use Nokogiri::XML.fragment, if you're sure the rest of it is well-formed. That won't give you errors about undefined tags.

Parse an HTML fragment whitelisting some custom tags

Tags:

html

ruby

nokogiri

ProGM

1 Answers

Dmitri

Recent Activity

Donate For Us

Parse an HTML fragment whitelisting some custom tags

Tags:

html

ruby

nokogiri

ProGM

1 Answers

Dmitri

Related questions

Recent Activity

Donate For Us