Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to specify User Agent and Referer in FileUtils.copyURLToFile(URL, File) method?

I'm using FileUtils.copyURLToFile(URL, File), an Apache Commons IO 2.4 part, to download and save the file on my computer. The problem is that some sites refuse connection without referrer and user agent data.

My questions:

  1. Is there any way to specify user agent and referrer to the copyURLToFile method?
  2. Or should I use another approach to download a file and then save a given InputStream to file?
like image 917
Mike Avatar asked Mar 14 '16 18:03

Mike


2 Answers

I've re-implement the functionality with HttpComponents instead of Commons-IO. This code allows you to download a file in Java according to its URL and save it at the specific destination.

The final code:

public static boolean saveFile(URL imgURL, String imgSavePath) {

    boolean isSucceed = true;

    CloseableHttpClient httpClient = HttpClients.createDefault();

    HttpGet httpGet = new HttpGet(imgURL.toString());
    httpGet.addHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.11 Safari/537.36");
    httpGet.addHeader("Referer", "https://www.google.com");

    try {
        CloseableHttpResponse httpResponse = httpClient.execute(httpGet);
        HttpEntity imageEntity = httpResponse.getEntity();

        if (imageEntity != null) {
            FileUtils.copyInputStreamToFile(imageEntity.getContent(), new File(imgSavePath));
        }

    } catch (IOException e) {
        isSucceed = false;
    }

    httpGet.releaseConnection();

    return isSucceed;
}

Of course, the code above takes more space then just single line of code:

FileUtils.copyURLToFile(imgURL, new File(imgSavePath),
                        URLS_FETCH_TIMEOUT, URLS_FETCH_TIMEOUT);

but it will give you more control over a process and let you specify not only timeouts but User-Agent and Referer values, which are critical for many web-sites.

like image 80
Mike Avatar answered Nov 14 '22 21:11

Mike


Completing the accepted answer on how to handle timeouts:

If you want to set timeouts, you have to create the CloseableHttpClient like this:

RequestConfig config = RequestConfig.custom()
                 .setConnectTimeout(connectionTimeout)
                 .setConnectionRequestTimeout(readDataTimeout)
                 .setSocketTimeout(readDataTimeout)
                 .build();

CloseableHttpClient httpClient = HttpClientBuilder
                 .create()
                 .setDefaultRequestConfig(config)
                 .build();

And, it may be a good idea to create your CloseableHttpClient using a try-with-resource statement to handle its closing:

try (CloseableHttpClient httpClient = HttpClientBuilder.create().setDefaultRequestConfig(config).build()) {
  ... rest of the code using httpClient
}
like image 39
Aldo Canepa Avatar answered Nov 14 '22 23:11

Aldo Canepa