I have the following code which works but I just want to know if it is possible in Jsoup to pinpoint the exact cause of error.
The following returns true (as expected)
private void validateProtocol() {
String html = "<p><a href='https://example.com/'>Link</a></p>";
Whitelist whiteList = Whitelist.basic();
whiteList.addProtocols("a","href","tel");
whiteList.removeProtocols("a","href","ftp");
boolean safe = Jsoup.isValid(html, whiteList);
System.out.println(safe);
}
When I change the above string to it returns false(as expected)
String html = "<p><a href='ftp://example.com/'>Link</a></p>";
Now when I have the following code, there are two errors one is an invalid protocol and one is the onfocus() link.
private void validateProtocol() {
String html = "<p><a href='ftp://example.com/' onfocus='invalidLink()'>Link</a></p>";
Whitelist whiteList = Whitelist.basic();
whiteList.addProtocols("a","href","tel", "device");
whiteList.removeProtocols("a","href","ftp");
boolean safe = Jsoup.isValid(html, whiteList);
System.out.println(safe);
}
The result is false but is there any way to figure out which part of the URL is false? for example - wrong protocol or wrong method..?
clean. Creates a new, clean document, from the original dirty document, containing only elements allowed by the safelist. The original document is not modified. Only elements from the dirty document's body are used.
Jsoup connection might become slow because of: your internet connection speed or. CPU usage (Some other program is eating up memory!) or. the respond speed of the web server you are accessing.
You want to create a custom whitelist with reporting feature.
public class MyReportEnabledWhitelist extends Whitelist {
private Set<String> alreadyCheckedAttributeSignatures = new HashSet<>();
@Override
protected boolean isSafeTag(String tag) {
boolean isSafe = super.isSafeTag(tag);
if (!isSafe) {
say("Disallowed tag: " + tag);
}
return isSafe;
}
@Override
protected boolean isSafeAttribute(String tagName, Element el, Attribute attr) {
boolean isSafe = super.isSafeAttribute(tagName, el, attr);
String signature = el.hashCode() + "-" + attr.hashCode();
if (alreadyCheckedAttributeSignatures.contains(signature) == false) {
alreadyCheckedAttributeSignatures.add(signature);
if (!isSafe) {
say("Wrong attribute: " + attr.getKey() + " (" + attr.html() + ") in " + el.outerHtml());
}
}
return isSafe;
}
}
String html = "<p><a href='ftp://example.com/' onfocus='invalidLink()'>Link</a></p><a href='ftp://example2.com/'>Link 2</a>";
// * Custom whitelist
Whitelist myReportEnabledWhitelist = new MyReportEnabledWhitelist()
// ** Basic whitelist (from Jsoup)
.addTags("a", "b", "blockquote", "br", "cite", "code", "dd", "dl", "dt", "em", "i", "li", "ol", "p", "pre", "q", "small", "span",
"strike", "strong", "sub", "sup", "u", "ul") //
.addAttributes("a", "href") //
.addAttributes("blockquote", "cite") //
.addAttributes("q", "cite") //
.addProtocols("a", "href", "ftp", "http", "https", "mailto") //
.addProtocols("blockquote", "cite", "http", "https") //
.addProtocols("cite", "cite", "http", "https") //
.addEnforcedAttribute("a", "rel", "nofollow") //
// ** Customizations
.addTags("body") //
.addProtocols("a", "href", "tel", "device") //
.removeProtocols("a", "href", "ftp");
boolean safeCustom = Jsoup.isValid(html, myReportEnabledWhitelist);
System.out.println(safeCustom);
Wrong attribute: href (href="ftp://example.com/") in <a href="ftp://example.com/" onfocus="invalidLink()">Link</a>
Wrong attribute: onfocus (onfocus="invalidLink()") in <a href="ftp://example.com/" onfocus="invalidLink()">Link</a>
Wrong attribute: href (href="ftp://example2.com/") in <a href="ftp://example2.com/">Link 2</a>
false
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With