i have a problem downloading a file from a url like www.example.com/example.pdf
via a proxy and saving it on the filesystem in java. Does anybody have an Idea on how this could work? if I get the InputStream i can simply save it to filesystem with this:
final ReadableByteChannel rbc = Channels.newChannel(httpUrlConnetion.getInputStream());
final FileOutputStream fos = new FileOutputStream(file);
fos.getChannel().transferFrom(rbc, 0, Long.MAX_VALUE);
fos.close();
but how to get the inputstream of the a url via a prox? if i am doing it like this:
SocketAddress addr = new InetSocketAddress("my.proxy.com", 8080);
Proxy proxy = new Proxy(Proxy.Type.HTTP, addr);
URL url = new URL("http://my.real.url.com/");
URLConnection conn = url.openConnection(proxy);
i am getting this exception:
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.io.BufferedInputStream.fill(Unknown Source)
at java.io.BufferedInputStream.read1(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
at sun.net.www.http.HttpClient.parseHTTPHeader(Unknown Source)
at sun.net.www.http.HttpClient.parseHTTP(Unknown Source)
at sun.net.www.http.HttpClient.parseHTTP(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at app.model.mail.crawler.newimpl.FileLoader.getSourceOfSiteViaProxy(FileLoader.java:167)
at app.model.mail.crawler.newimpl.FileLoader.process(FileLoader.java:220)
at app.model.mail.crawler.newimpl.FileLoader.run(FileLoader.java:57)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
using this:
final HttpURLConnection httpUrlConnetion = (HttpURLConnection) website.openConnection(proxy);
httpUrlConnetion.setDoOutput(true);
httpUrlConnetion.setDoInput(true);
httpUrlConnetion.setRequestProperty("Content-type", "text/xml");
httpUrlConnetion.setRequestProperty("Accept", "text/xml, application/xml");
httpUrlConnetion.setRequestMethod("POST");
httpUrlConnetion.connect();
i am able to download the source of a site which is html, but not a file maybe someone could help me with the properties i have to set for downloading a file.
To set a proxy programmatically:
SocketAddress addr = new InetSocketAddress("my.proxy.com", 8080);
Proxy proxy = new Proxy(Proxy.Type.HTTP, addr);
URL url = new URL("http://my.real.url.com/");
URLConnection conn = url.openConnection(proxy);
Then you can use your code above with the URLConnection
returned on the last line. You can also use a SOCKS proxy, or force no proxy, if you so desire.
This was taken (and slightly edited) from this Oracle documentation.
Another approach is to implement the proxy "inside" each instance of httpUrlConnection. That is:
If it works, the connection will transparently send the file to you.
I have some code that worked with Sockets.
try {
Socket sock = new Socket("10.0.241.1", 3128); //proxy IP and port
InputStream is = sock.getInputStream();
OutputStream os = sock.getOutputStream();
String str = "GET http://www.uol.com.br HTTP/1.1\r\n"; //GET your site
str += "Host: www.uol.com.br\r\n"; //again, Host of your site
str += "Proxy-Authorization: Basic ZWR1YXJkby5wb2NvOmM1NmQyMw==\r\n"; //if password is needed
str += "\r\n";
os.write(str.getBytes());
byte[] bb = new byte[1024];
int L = 0;
while ((L = is.read(bb)) != -1) {
//write bytes to file stream...
}
} catch (Exception ex) {
//exception handling...
}
"Why would somebody use pure sockets when one could use httpUrlConnection?", you say. Well, by that time, I didn't know about httpUrlConnection.
It is possible to use the library Apache httpclient that solves most of the issue with proxies. To compile the code below, you can use the following maven:
Maven:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>stackoverflow.test</groupId>
<artifactId>proxyhttp</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<name>proxy</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.1</version>
</dependency>
</dependencies>
</project>
Java code:
import org.apache.http.HttpHost;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;
/**
* How to send a request via proxy.
*
* @since 4.0
*/
public class ClientExecuteProxy {
public static void main(String[] args)throws Exception {
CloseableHttpClient httpclient = HttpClients.createDefault();
try {
HttpHost target = new HttpHost("www.google.com", 80, "http");
HttpHost proxy = new HttpHost("127.0.0.1", 8889, "http");
RequestConfig config = RequestConfig.custom()
.setProxy(proxy)
.build();
HttpGet request = new HttpGet("/");
request.setConfig(config);
System.out.println("Executing request " + request.getRequestLine() + " to " + target + " via " + proxy);
CloseableHttpResponse response = httpclient.execute(target, request);
try {
System.out.println("----------------------------------------");
System.out.println(response.getStatusLine());
System.out.println(EntityUtils.toString(response.getEntity()));
} finally {
response.close();
}
} finally {
httpclient.close();
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With