Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

404 vs 403 when directory index is missing

This is mostly a philosophical question about the best way to interpret the HTTP spec. Should a directory with no directory index (e.g. index.html) return 404 or 403? (403 is the default in Apache.)

For example, suppose the following URLs exist and are accessible:

http://example.com/files/file_1/
http://example.com/files/file_2/

But there's nothing at:

http://example.com/files/

(Assume we're using 301s to force trailing slashes for all URLs.)

I think several things should be taken into account:

  • By default, Apache returns 403 in this scenario. That's significant to me. They've thought about this stuff, and they made the decision to use 403.
  • According to W3C, 403 means "The server understood the request, but is refusing to fulfill it." I take that to mean you should return 403 if the URL is meaningful but nonetheless forbidden.
  • 403 might result in information disclosure if the client correctly guesses that the URL maps to a real directory on disk.
  • http://example.com/files/ isn't a resource, and the fact that it internally maps to a directory shouldn't be relevant to the status code.
  • If you interpret the URL scheme as defining a directory structure from the client's perspective, the internal implementation is still irrelevant, but perhaps the outward appearance should indeed have some bearing on the status codes. Maybe, even if you created the same URL structure without using directories internally, you should still use 403s, because it's about the client's perception of a directory structure.

In the balance, what do you think is the best approach? Should we just say "a resource is a resource, and if it doesn't exist, it's a 404?" Or should we say, "if it has slashes, it looks like a directory to the client, and therefore it's a 403 if there's no index?"

If you're in the 403 camp, do you think you should go out of your way to return 403s even if the internal implementation doesn't use directories? Suppose, for example, that you have a dynamic web app with this URL: http://example.com/users/joe, which maps to some code that generates the profile page for Joe. Assuming you don't write something that lists all users, should http://example.com/users/ return 403? (Many if not all web frameworks return 404 in this case.)

like image 759
rlkw1024 Avatar asked Feb 22 '11 07:02

rlkw1024


People also ask

What is the difference between 403 and 404 error?

HTTP Error 403 - Forbidden or HTTP Error 404 - File Not Found.

What does error code 403 Forbidden mean?

The HTTP 403 Forbidden response status code indicates that the server understands the request but refuses to authorize it. This status is similar to 401 , but for the 403 Forbidden status code, re-authenticating makes no difference.

What does the HTTP status code 401 indicate Internal server error not Found Bad request Unauthorized?

The HyperText Transfer Protocol (HTTP) 401 Unauthorized response status code indicates that the client request has not been completed because it lacks valid authentication credentials for the requested resource.

When can I return my 403?

401 Unauthorized is the status code to return when the client provides no credentials or invalid credentials. 403 Forbidden is the status code to return when a client has valid credentials but not enough privileges to perform an action on a resource.


2 Answers

The first step to answering this is to refer to RFC 2616: HTTP/1.1. Specifically the sections talking about 403 Forbidden and 404 Not Found.

  • 10.4.4 403 Forbidden

The server understood the request, but is refusing to fulfill it. Authorization will not help and the request SHOULD NOT be repeated. If the request method was not HEAD and the server wishes to make public why the request has not been fulfilled, it SHOULD describe the reason for the refusal in the entity. If the server does not wish to make this information available to the client, the status code 404 (Not Found) can be used instead.

  • 10.4.5 404 Not Found

The server has not found anything matching the Request-URI. No indication is given of whether the condition is temporary or permanent. The 410 (Gone) status code SHOULD be used if the server knows, through some internally configurable mechanism, that an old resource is permanently unavailable and has no forwarding address. This status code is commonly used when the server does not wish to reveal exactly why the request has been refused, or when no other response is applicable.

My interpretation of this is that 404 is the more general error code that just says "there's nothing there". 403 says "there's nothing there, don't try again!".

One reason why Apache might return 403 on directories without explicit index files is that auto-indexing (i.e. listing all files in it) is disabled (a.k.a "forbidden"). In that case saying "listing all files in this directory is forbidden" makes more sense than saying "there is no directory".

like image 196
Joachim Sauer Avatar answered Sep 22 '22 10:09

Joachim Sauer


Another argument why 404 is preferable: google webmaster tools.

Indeed, for a 404, Google Webmaster Tool displays the referer (allowing you to clean up the bad link to the directory), whereas for a 403, it doesn't display it.

like image 31
Alain Knaff Avatar answered Sep 22 '22 10:09

Alain Knaff