I have tried:
Whitelist.relaxed();
Whitelist.relaxed().preserveRelativeLinks(true);
Whitelist.relaxed().addProtocols("a","href","#","/","http","https","mailto","ftp");
Whitelist.relaxed().addProtocols("a","href","#","/","http","https","mailto","ftp").preserveRelativeLinks(true);
None of them work: When I try to clean a relative url, like <a href="/test.xhtml">test</a>
I get the href
attribute removed (<a>test</a>
).
I am using JSoup 1.8.2.
Any ideas?
The problem most likely stems from the call of the clean method. If you give the base URI all should work as expected:
String html = ""
+ "<a href=\"/test.xhtml\">test</a>"
+ "<invalid>stuff</invalid>"
+ "<h2>header1</h2>";
String cleaned = Jsoup.clean(html, "http://base.uri", Whitelist.relaxed().preserveRelativeLinks(true));
System.out.println(cleaned);
The above works and keeps the relative links. With String cleaned = Jsoup.clean(html, Whitelist.relaxed().preserveRelativeLinks(true))
however the link is deleted.
Note the documentation of Whitelist.preserveRelativeLinks(true):
Note that when handling relative links, the input document must have an appropriate base URI set when parsing, so that the link's protocol can be confirmed. Regardless of the setting of the preserve relative links option, the link must be resolvable against the base URI to an allowed protocol; otherwise the attribute will be removed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With