I am having difficulties with Jsoup parser. How can I tell if given string is a valid HTML code?
String input = "Your vote was successfully added."
boolean isValid = Jsoup.isValid(input);
// isValid = true
isValid flag is true, because Jsoup first uses HtmlTreeBuilder: if ony of html, head or body tag is missing, it adds them by itself. Then it uses Cleaner class and checks it against given Whitelist.
Is there any simple way to check if string is a valid HTML without Jsoup attempts to make it HTML?
My example is AJAX response, which comes as "text/html" content type. Then it goes to parser, Jsoup adds this tags and as a result, response is not displayed properly.
Thanks for your help.
First of all, solution proposed by Reuben is not working as expected. Pattern has to be compiled with Pattern.DOTALL flag. Input HTML may have (and probably will) new line signs etc.
So it should be something like this:
Pattern htmlPattern = Pattern.compile(".*\\<[^>]+>.*", Pattern.DOTALL);
boolean isHTML = htmlPattern.matcher(input).matches();
I also think that this pattern should find HTML tag not only . Next: is not the only valid option. There may also be attribute i.e . This also has to be handled.
I chose to modify Jsoup source. If HTMLTreeBuilder (actually state BeforeHtml) tries to add <html>
element I throw ParseException and then I am sure that input file was not a valid HTML file.
Use regex to check String
contains HTML
or not
boolean isHTML = input.matches(".*\\<[^>]+>.*");
If your String
contains HTML
value then it will return true
String input = "<html><body></body></html>" ;
But this code String input = "Hello World <>";
will return false
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With