How do you Programmatically Download a Webpage in Java

public static void main(String[] args) {
    URL url;
    InputStream is = null;
    BufferedReader br;
    String line;

    try {
        url = new URL("http://stackoverflow.com/");
        is = url.openStream();  // throws an IOException
        br = new BufferedReader(new InputStreamReader(is));

        while ((line = br.readLine()) != null) {
            System.out.println(line);
        }
    } catch (MalformedURLException mue) {
         mue.printStackTrace();
    } catch (IOException ioe) {
         ioe.printStackTrace();
    } finally {
        try {
            if (is != null) is.close();
        } catch (IOException ioe) {
            // nothing to see here
        }
    }
}

Bill's answer is very good, but you may want to do some things with the request like compression or user-agents. The following code shows how you can various types of compression to your requests.

URL url = new URL(urlStr);
HttpURLConnection conn = (HttpURLConnection) url.openConnection(); // Cast shouldn't fail
HttpURLConnection.setFollowRedirects(true);
// allow both GZip and Deflate (ZLib) encodings
conn.setRequestProperty("Accept-Encoding", "gzip, deflate");
String encoding = conn.getContentEncoding();
InputStream inStr = null;

// create the appropriate stream wrapper based on
// the encoding type
if (encoding != null && encoding.equalsIgnoreCase("gzip")) {
    inStr = new GZIPInputStream(conn.getInputStream());
} else if (encoding != null && encoding.equalsIgnoreCase("deflate")) {
    inStr = new InflaterInputStream(conn.getInputStream(),
      new Inflater(true));
} else {
    inStr = conn.getInputStream();
}

To also set the user-agent add the following code:

conn.setRequestProperty ( "User-agent", "my agent name");

Well, you could go with the built-in libraries such as URL and URLConnection, but they don't give very much control.

~~Personally I'd go with the Apache HTTPClient library.~~
Edit: HTTPClient has been set to end of life by Apache. The replacement is: HTTP Components

All the above mentioned approaches do not download the web page text as it looks in the browser. these days a lot of data is loaded into browsers through scripts in html pages. none of above mentioned techniques supports scripts, they just downloads the html text only. HTMLUNIT supports the javascripts. so if you are looking to download the web page text as it looks in the browser then you should use HTMLUNIT.

You'd most likely need to extract code from a secure web page (https protocol). In the following example, the html file is being saved into c:\temp\filename.html Enjoy!

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.URL;

import javax.net.ssl.HttpsURLConnection;

/**
 * <b>Get the Html source from the secure url </b>
 */
public class HttpsClientUtil {
    public static void main(String[] args) throws Exception {
        String httpsURL = "https://stackoverflow.com";
        String FILENAME = "c:\\temp\\filename.html";
        BufferedWriter bw = new BufferedWriter(new FileWriter(FILENAME));
        URL myurl = new URL(httpsURL);
        HttpsURLConnection con = (HttpsURLConnection) myurl.openConnection();
        con.setRequestProperty ( "User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0" );
        InputStream ins = con.getInputStream();
        InputStreamReader isr = new InputStreamReader(ins, "Windows-1252");
        BufferedReader in = new BufferedReader(isr);
        String inputLine;

        // Write each line into the file
        while ((inputLine = in.readLine()) != null) {
            System.out.println(inputLine);
            bw.write(inputLine);
        }
        in.close(); 
        bw.close();
    }
}

To do so using NIO.2 powerful Files.copy(InputStream in, Path target):

URL url = new URL( "http://download.me/" );
Files.copy( url.openStream(), Paths.get("downloaded.html" ) );

Related questions
                            
                                Why do I need Transaction in Hibernate for read-only operations?
                            
                                Can I replace groups in Java regex?
                            
                                How do I format a number in Java?
                            
                                How do you get the length of a list in the JSF expression language?
                            
                                How to iterate over the files of a certain directory, in Java? [duplicate]
                            
                                How can I catch all the exceptions that will be thrown through reading and writing a file?
                            
                                What is the idiomatic way to compose a URL or URI in Java?
                            
                                Hide Utility Class Constructor : Utility classes should not have a public or default constructor
                            
                                Storing integer values as constants in Enum manner in java [duplicate]
                            
                                Programmatically change log level in Log4j2
                            
                                Java lib or app to convert CSV to XML file? [closed]
                            
                                Java: How to Indent XML Generated by Transformer
                            
                                Hibernate Criteria returns children multiple times with FetchType.EAGER
                            
                                Ubuntu: OpenJDK 8 - Unable to locate package
                            
                                error upon assigning Layout: BoxLayout can't be shared
                            
                                JFrame in full screen Java
                            
                                Differences in boolean operators: & vs && and | vs ||
                            
                                Does a finally block always run?
                            
                                Java split() method strips empty strings at the end? [duplicate]
                            
                                How to get Locale from its String representation in Java?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do you Programmatically Download a Webpage in Java

Tags:

java

http

compression

People also ask

See also:

Recent Activity

Donate For Us