I have a list of URLs which I need to get the content of. The URL is with special characters and thus needs to be encoded. I use Commons HtpClient to get the content.
when I use:
GetMethod get = new GetMethod(url);
I get a " Invalid "illegal escape character" exception. when I use
GetMethod get = new GetMethod();
get.setURI(new URI(url.toString(), false, "UTF-8"));
I get 404 when trying to get the page, because a space is turned to %2520
instead of just %20
.
I've seen many posts about this problem, and most of them advice to build the URI part by part. The problem is that it's a given list of URLs, not a one that I can handle manually.
Any other solution for this problem?
thanks.
What if you create a new URL object from it's string like URL urlObject = new URL(url)
, then do urlObject.getQuery()
and urlObject.getPath()
to split it right, parse the Query Params into a List or a Map or something and do something like:
EDIT: I just found out that HttpClient Library has a URLEncodedUtils.parse()
method which you can use easily with the code provided below. I'll edit it to fit, however is untested.
With Apache HttpClient it would be something like:
URI urlObject = new URI(url,"UTF-8");
HttpClient httpclient = new DefaultHttpClient();
List<NameValuePair> formparams = URLEncodedUtils.parse(urlObject,"UTF-8");
UrlEncodedFormEntity entity;
entity = new UrlEncodedFormEntity(formparams);
HttpPost httppost = new HttpPost(urlObject.getPath());
httppost.setEntity(entity);
httppost.addHeader("Content-Type","application/x-www-form-urlencoded");
HttpResponse response = httpclient.execute(httppost);
HttpEntity entity2 = response.getEntity();
With Java URLConnection it would be something like:
// Iterate over query params from urlObject.getQuery() like
while(en.hasMoreElements()){
String paramName = (String)en.nextElement(); // Iterator over yourListOfKeys
String paramValue = yourMapOfValues.get(paramName); // replace yourMapOfNameValues
str = str + "&" + paramName + "=" + URLEncoder.encode(paramValue);
}
try{
URL u = new URL(urlObject.getPath()); //here's the url path from your urlObject
URLConnection uc = u.openConnection();
uc.setDoOutput(true);
uc.setRequestProperty("Content-Type","application/x-www-form-urlencoded");
PrintWriter pw = new PrintWriter(uc.getOutputStream());
pw.println(str);
pw.close();
BufferedReader in = new BufferedReader(new
InputStreamReader(uc.getInputStream()));
String res = in.readLine();
in.close();
// ...
}
If you need to manipulate with request URIs it is strongly advisable to use URIBuilder
shipped with Apache HttpClient.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With