Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Manually sending GET request to a website. 302 redirect error

I am currently working on a web scraper using java. I am manually sending the GET request by setting up a tcp connection and using printerwriter.

I am able to connect to most websites such as yahoo.com or cracked.com and receive a response, BUT I am unable to connect to my target website - vinylengine.com. It will always return a 302 error.

I have compared my send request with my browser's and they are nearly identical.

My header:

GET / HTTP/1.1
Host: www.vinylengine.com

My Response:

HTTP/1.1 302 Found
Date: Thu, 06 Jun 2013 19:27:00 GMT
Server: Apache
Location: http://www.nakedresource.com/
Cache-Control: max-age=1209600
Expires: Thu, 20 Jun 2013 19:27:00 GMT
Content-Length: 213
Content-Type: text/html; charset=iso-8859-1

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>302 Found</title>
</head><body>
<h1>Found</h1>
<p>The document has moved <a href="http://www.nakedresource.com/">here</a>.</p>
</body></html>

Browser's header:

GET http://www.vinylengine.com/ HTTP/1.1
Host: www.vinylengine.com
User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64; rv:21.0) Gecko/20100101 Firefox/21.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Cookie: __utma=72407316.18415374.1370488314.1370497873.1370543389.3; __utmz=72407316.1370488314.1.1.utmccn=(direct)|utmcsr=(direct)|utmcmd=(none); SESSaf8d12283bdbdc5f5bbfb2aef054db6d=1f0676e5cab0ba2c5a80e76ea0bd6f75; __utmc=72407316; has_js=1; __utmb=72407316
Connection: keep-alive
If-Modified-Since: Thu, 06 Jun 2013 18:02:53 GMT
If-None-Match: "2186d59ac297e0f1a43433fa61e8a94b"

Code:

public void sendRequest(String extensionString, String urlString)
{  
    try 
    {
        //BufferedReader inFromServer;
        //PrintWriter outToServer;
        //These 2 are initalized elsewhere

        outToServer.println("GET " + extensionString + " HTTP/1.1");
        outToServer.println("Host: " + urlString);

        outToServer.println("");
        outToServer.flush();

        String temp;
        while((temp=inFromServer.readLine()) != null) 
        {
            System.out.println(temp);
        }

        return;
    } 
    catch (Exception e) 
    {
        System.out.printf("sendRequest failed: %s",e);
        return;
    }
}

I have tried changing the host name to nakedresource.com, but when I do that, I get the page source for nakedresource.com and not vinylengine.com

like image 897
Python Lord Avatar asked Jun 06 '13 19:06

Python Lord


People also ask

How do I fix a 302 redirect error?

You can follow these five steps to fix HTTP 302 errors on your website: Determine whether the redirects are appropriate or not by examining the URLs that are issuing the 302 redirects. Check your plugins to make sure any redirect settings are valid. Ensure that your WordPress URL settings are configured correctly.

What is a 302 redirect error?

The HyperText Transfer Protocol (HTTP) 302 Found redirect status response code indicates that the resource requested has been temporarily moved to the URL given by the Location header.

What causes a 302 redirect?

What is an HTTP 302? The 302 status code is a redirection message that occurs when a resource or page you're attempting to load has been temporarily moved to a different location. It's usually caused by the web server and doesn't impact the user experience, as the redirect happens automatically.


Video Answer


3 Answers

The site in question is looking at your user agent string (Or lack thereof in your case).

When you say you're doing "almost the same thing" as the browser ... you're right. And computers are kinda picky about things like that.

If you don't supply a User-Agent: header you get a redirect.

> telnet www.vinylengine.com 80
Trying 67.225.154.112...
Connected to vinylengine.com.
Escape character is '^]'.
GET / HTTP/1.1
Host: www.vinylengine.com
Accept: */*

HTTP/1.1 302 Found
...

Whereas if you do provide one, you get the page:

> telnet www.vinylengine.com 80
Trying 67.225.154.112...
Connected to vinylengine.com.
Escape character is '^]'.
GET / HTTP/1.1
Host: www.vinylengine.com
User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4 OpenSSL/0.9.8r zlib/1.2.5
Accept: */*

HTTP/1.1 200 OK
... (the page)

This is usually done because the site is providing different versions of the content to different browsers as determined by the User-Agent header. Apparently their answer to "no User-Agent" is ... punt, and you get redirected to the parent site root.

like image 111
Brian Roach Avatar answered Nov 14 '22 22:11

Brian Roach


HttpURLConnection.setFollowRedirects(true);

If you are using HttpURLConnection use the code above.

Also refer to Example showing HTTP redirects

like image 36
user1889970 Avatar answered Nov 14 '22 21:11

user1889970


This could be possible when you have your proxy configured in your browser but your JVM is unaware of it.

Try to start your JVM with the following arguments and see if it fixes the issue:

-Dhttp.proxyHost=10.12.11.1 -Dhttp.proxyPort=8800
like image 35
Chris Avatar answered Nov 14 '22 22:11

Chris