I'm using Jsoup
to get html from web sites. I'm using
String url="http://www.example.com";
Document doc=Jsoup.connect(url).get();
this code to get html. But when I use some Turkish letters in the link like this;
String url="http://www.example.com/?q=Türkçe";
Document doc=Jsoup.connect(url).get();
Jsoup sends the request like this: "http://www.example.com/?q=Trke"
So I can't get the correct result. How can I solve this problem?
Working solution, if encoding is UTF-8
then simply use
Document document = Jsoup.connect("http://www.example.com")
.data("q", "Türkçe")
.get();
with result
URL=http://www.example.com?q=T%C3%BCrk%C3%A7e
For custom encoding this can be used:
String encodedUrl = URLEncoder.encode("http://www.example.com/q=Türkçe", "ISO-8859-3");
String encodedBaseUrl = URLEncoder.encode("http://www.example.com/q=", "ISO-8859-3");
String query = encodedUrl.replace(encodedBaseUrl, "");
Document doc= Jsoup.connect("http://www.example.com")
.data("q", query)
.get();
Unicode Characters are not allowed in URLs as per the specification. We're used to see them, because browsers display them in adress bars, but they are not sent to servers.
You have to URL encode your path before passing it to JSoup
.
Jsoup.connect("http://www.example.com").data("q", "Türkçe")
as proposed by MariuszS does just that
I found this on google: http://turkishbasics.com/resources/turkish-characters-html-codes.php Maybe u can add it like this:
String url="http://www.example.com/?q=Türkçe";
Document doc=Jsoup.connect(url).get();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With