Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java - Quickest way to check if URL exists

Tags:

java

url

jsoup

Hi I am writing a program that goes through many different URLs and just checks if they exist or not. I am basically checking if the error code returned is 404 or not. However as I am checking over 1000 URLs, I want to be able to do this very quickly. The following is my code, I was wondering how I can modify it to work quickly (if possible):

final URL url = new URL("http://www.example.com");
HttpURLConnection huc = (HttpURLConnection) url.openConnection();
int responseCode = huc.getResponseCode();

if (responseCode != 404) {
System.out.println("GOOD");
} else {
System.out.println("BAD");
}

Would it be quicker to use JSoup?

I am aware some sites give the code 200 and have their own error page, however I know the links that I am checking dont do this, so this is not needed.

like image 411
Matt9Atkins Avatar asked Aug 08 '13 19:08

Matt9Atkins


People also ask

How do I check if a URL exists?

Existence of an URL can be checked by checking the status code in the response header. The status code 200 is Standard response for successful HTTP requests and status code 404 means URL doesn't exist. Used Functions: get_headers() Function: It fetches all the headers sent by the server in response to the HTTP request.

Which HTTP method check if URI exists?

Java Language HttpURLConnection Check if resource exists If you are just checking if a resource exists, it better to use a HEAD request than a GET. This avoids the overhead of transferring the resource. Note that the method only returns true if the response code is 200 .


2 Answers

Try sending a "HEAD" request instead of get request. That should be faster since the response body is not downloaded.

huc.setRequestMethod("HEAD"); 

Again instead of checking if response status is not 400, check if it is 200. That is check for positive instead of negative. 404,403,402.. all 40x statuses are nearly equivalent to invalid non-existant url.

You may make use of multi-threading to make it even faster.

like image 184
Vishnuprasad R Avatar answered Sep 29 '22 18:09

Vishnuprasad R


Try to ask the next DNS Server

class DNSLookup {     public static void main(String args[])     {         String host = "stackoverflow.com";         try         {             InetAddress inetAddress = InetAddress.getByName(host);             // show the Internet Address as name/address             System.out.println(inetAddress.getHostName() + " " + inetAddress.getHostAddress());         }         catch (UnknownHostException exception)         {             System.err.println("ERROR: Cannot access '" + host + "'");         }         catch (NamingException exception)         {             System.err.println("ERROR: No DNS record for '" + host + "'");             exception.printStackTrace();         }     } } 
like image 28
Khinsu Avatar answered Sep 29 '22 17:09

Khinsu