Hi I am writing a program that goes through many different URLs and just checks if they exist or not. I am basically checking if the error code returned is 404 or not. However as I am checking over 1000 URLs, I want to be able to do this very quickly. The following is my code, I was wondering how I can modify it to work quickly (if possible):
final URL url = new URL("http://www.example.com");
HttpURLConnection huc = (HttpURLConnection) url.openConnection();
int responseCode = huc.getResponseCode();
if (responseCode != 404) {
System.out.println("GOOD");
} else {
System.out.println("BAD");
}
Would it be quicker to use JSoup?
I am aware some sites give the code 200 and have their own error page, however I know the links that I am checking dont do this, so this is not needed.
Existence of an URL can be checked by checking the status code in the response header. The status code 200 is Standard response for successful HTTP requests and status code 404 means URL doesn't exist. Used Functions: get_headers() Function: It fetches all the headers sent by the server in response to the HTTP request.
Java Language HttpURLConnection Check if resource exists If you are just checking if a resource exists, it better to use a HEAD request than a GET. This avoids the overhead of transferring the resource. Note that the method only returns true if the response code is 200 .
Try sending a "HEAD" request instead of get request. That should be faster since the response body is not downloaded.
huc.setRequestMethod("HEAD");
Again instead of checking if response status is not 400, check if it is 200. That is check for positive instead of negative. 404,403,402.. all 40x statuses are nearly equivalent to invalid non-existant url.
You may make use of multi-threading to make it even faster.
Try to ask the next DNS Server
class DNSLookup { public static void main(String args[]) { String host = "stackoverflow.com"; try { InetAddress inetAddress = InetAddress.getByName(host); // show the Internet Address as name/address System.out.println(inetAddress.getHostName() + " " + inetAddress.getHostAddress()); } catch (UnknownHostException exception) { System.err.println("ERROR: Cannot access '" + host + "'"); } catch (NamingException exception) { System.err.println("ERROR: No DNS record for '" + host + "'"); exception.printStackTrace(); } } }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With