I did some research and it seems that is standard Jsoup make this change. I wonder if there is a way to configure this or is there some other Parser I can be converted to a document of Jsoup, or some way to fix this?
Unfortunately not, the constructor of Tag
class changes the name to lower case:
private Tag(String tagName) {
this.tagName = tagName.toLowerCase();
}
But there are two ways to change this behavour:
Example for #2:
Field tagName = Tag.class.getDeclaredField("tagName"); // Get the field which contains the tagname
tagName.setAccessible(true); // Set accessible to allow changes
for( Element element : doc.select("*") ) // Iterate over all tags
{
Tag tag = element.tag(); // Get the tag of the element
String value = tagName.get(tag).toString(); // Get the value (= name) of the tag
if( !value.startsWith("#") ) // You can ignore all tags starting with a '#'
{
tagName.set(tag, value.toUpperCase()); // Set the tagname to the uppercase
}
}
tagName.setAccessible(false); // Revert to false
You must use xmlParser instead of htmlParser and the tags will remain unchanged. One line does the trick:
String html = "<camelCaseTag>some text</camelCaseTag>";
Document doc = Jsoup.parse(html, "", Parser.xmlParser());
Here is a code sample (version >= 1.11.x):
Parser parser = Parser.htmlParser();
parser.settings(new ParseSettings(true, true));
Document doc = parser.parseInput(html, baseUrl);
There is ParseSettings class introduced in version 1.9.3. It comes with options to preserve case for tags and attributes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With