What I learned from Foregenix:
The HTTP 404 Not Found Error means that the webpage you were trying to reach could not be found on the server. It is a Client-side Error which means that either the page has been removed or moved and the URL was not changed accordingly, or that you typed in the URL incorrectly
But then I also do web app pentests with Python and I am wondering that if I only check for the String 404
on the page, it may not really be a 404
error.
It can so happen that the page exists but the heading is 404
just to fool us.
So how exactly do I find out?
A 404 page is a landing page that tells your site viewers the requested page is unavailable or, in some cases, doesn't exist. A 404 error tells users the page cannot be accessed – and it can be a major problem. When users can't access a page, they can't find the information they need.
A 404 page is the webpage served to a user who tries to access a page that cannot be located at the URL provided. The 404 error indicates that the server where the page should reside has been contacted but that the page is does not exist at that address, at that time.
You can check the HTTP status code, and see if it is 404 or not. The status code is on the first line of the response:
HTTP/1.1 404 Not Found
If you are using HTTPlib you can just read the status
property of the HTTPResponse
object.
However, it is the server that decides what HTTP status code to send. Just because 404 is defined to mean "page not found" does not mean the server can not lie to you. It is quite common to do things like this:
Without access to the server, it is impossible to know what is really going on behind the curtains.
You are right: someone could write "404 Page Not Found" in a HTML page and make you think that the page doesn't exist.
In order to properly recognize HTTP status codes such as the 404, you should capture the HTTP response with Python and parse it. HTTP 1 and HTTP 2 standards dictate that an HTTP response, which is written in the HTTP generic message format, must contain the status code.
Example of an HTTP response (from Tutorials Point):
HTTP/1.1 404 Not Found
Date: Sun, 18 Oct 2012 10:36:20 GMT
Server: Apache/2.2.14 (Win32)
Content-Length: 230
Connection: Closed
Content-Type: text/html; charset=iso-8859-1
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html>
<head>
<title>404 Not Found</title>
</head>
<body>
<h1>Not Found</h1>
<p>The requested URL /t.html was not found on this server.</p>
</body>
</html>
You should definitely not trust the HTML part, which can show a 404 error (or even a 418 I'm a teapot) when the page can in fact be found.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With