Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Jsoup changes output from single quote to double quote on HTML attributes

We are using Jsoup to parse, manipulate and extend a html template. So far everything works fine until it comes to single quotes used in combination with HTML attributes

<span data-attr='JSON'></span>

That HTML snippet is converted to

<span data-attr="JSON"></span>

which will conflict with the inner json data which is specified as valid with double quotes only

{"param" : "value"} //valid
{'param' : 'value'} //invalid

so we need to force Jsoup to NOT change those single quotes to double quotes, but how? Currently that is our code to parse and produce html content.

pageTemplate = Jsoup.parse(new File(mainTemplateFilePath), "UTF-8");
pageTemplate.outputSettings().escapeMode(Entities.EscapeMode.xhtml);
pageTemplate.outputSettings().charset("UTF-8");

... adding some html 

pageTemplate.html(); // will output the double quoted attributes :(
like image 384
MatthiasLaug Avatar asked Nov 29 '12 16:11

MatthiasLaug


1 Answers

You need to HTML encode the JSON value before putting it into the data-attr attribute. When you do so, you should end up with this:

<span data-attr="{&quot;param&quot;:&quot;value&quot;}"></span>

Although that looks fairly daunting, it is actually valid HTML. When your corresponding JavaScript executes someSpan.getAttribute("data-attr"), the &quot; values will be transformed into " values automatically, giving you access to the original valid JSON string.

like image 65
Chris Nielsen Avatar answered Oct 12 '22 23:10

Chris Nielsen