Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

java.io.FileNotFoundException for valid URL

I use library rome.dev.java.net to fetch RSS.

Code is

URL feedUrl = new URL("http://planet.rubyonrails.ru/xml/rss");
SyndFeedInput input = new SyndFeedInput();
SyndFeed feed = input.build(new XmlReader(feedUrl));

You can check that http://planet.rubyonrails.ru/xml/rss is valid URL and the page is shown in browser.

But I get exception from my application

java.io.FileNotFoundException: http://planet.rubyonrails.ru/xml/rss
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1311)
        at com.sun.syndication.io.XmlReader.<init>(XmlReader.java:237)
        at com.sun.syndication.io.XmlReader.<init>(XmlReader.java:213)
        at rssdaemonapp.ValidatorThread.run(ValidatorThread.java:32)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

I don't use any proxy. I get this exception on my PC and on the production server and only for this URL, other URLs are working.

like image 686
Alexei Avatar asked May 08 '10 12:05

Alexei


2 Answers

The code that is throwing that exception looks like this ... assuming I've got the right version:

if (respCode >= 400) {
    if (respCode == 404 || respCode == 410) {
        throw new FileNotFoundException(url.toString());
    } else {
        throw new java.io.IOException(
            "Server returned HTTP"
            + " response code: " + respCode
            + " for URL: " + url.toString());
    }
}

In other words, when you are doing the GET from Java, you are getting a 404 or 410 response. Now when I do the request using the wget utility, I get a 200 response. So my guess is that the problem is one of the following:

  • You happened to make the request when they were suffering from some configuration problem.
  • They have implemented their server to return 404 / 410 for certain User-Agent strings.

Other possibilities are that they are doing some kind of server-side filtering on IP addresses or that there is some DNS problem that is causing your requests to go to a different IP address. But both of these seem to be contradicted by the fact that you can access the feed in your browser.

If this is the User-Agent, take a look at their terms of service to see if they have a banned certain kinds of use of their site / RSS feed.

like image 89
Stephen C Avatar answered Oct 02 '22 14:10

Stephen C


I suspect it doesn't like Java. You need to fake your "User-Agent" header, not sure if it's doable with your RSS library.

Another suggestion is that you fetch the data yourself and feed the data to the feed reader.

like image 21
ZZ Coder Avatar answered Oct 02 '22 12:10

ZZ Coder