Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to preserve case in jsoup parsing?

Tags:

java

jsoup

I am using jsoup to parse some HTML content. After parsing the HTML content, it changes the camel cased attributes to lowercase like <svg viewBox='XXXX'> to <svg viewbox='XXXX'>.

Can someone suggest me how i can preserve the case while parsing html content using jsoup 1.8.1?

like image 476
user3069716 Avatar asked Jul 14 '15 07:07

user3069716


1 Answers

I just released jsoup 1.10.1 which includes support for preserving tag and/or attribute case. You can control it with ParseSettings. By default the HTML parser will continue to lower case normalize tags and attributes, and the XML parser will preserve them. You can specify these settings when you create the parser.

To use the XML parser (which preserves case by default):

Document doc = Jsoup.parse(xml, baseUrl, Parser.xmlParser());

To use the HTML parser and set it to preserve-case:

Parser parser = Parser.htmlParser();
parser.settings(new ParseSettings(true, true)); // tag, attribute preserve case
Document doc = parser.parseInput(html, baseUrl);
like image 97
Jonathan Hedley Avatar answered Oct 10 '22 21:10

Jonathan Hedley