Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can a web server know when an HTTP request is fully received?

Tags:

c++

http

sockets

I'm currently writing a very simple web server to learn more about low level socket programming. More specifically, I'm using C++ as my main language and I am trying to encapsulate the low level C system calls inside C++ classes with a more high level API.

I have written a Socket class that manages a socket file descriptor and handles opening and closing using RAII. This class also exposes the standard socket operations for a connection oriented socket (TCP) such as bind, listen, accept, connect etc.

After reading the man pages for the send and recv system calls I realized that I needed to call these functions inside some form of loop in order to guarantee that all bytes are successfully sent/received.

My API for sending and receiving looks similar to this

void SendBytes(const std::vector<std::uint8_t>& bytes) const;
void SendStr(const std::string& str) const;
std::vector<std::uint8_t> ReceiveBytes() const;
std::string ReceiveStr() const;

For the send functionality I decided to use a blocking send call inside a loop such as this (it is an internal helper function that works for both std::string and std::vector).

template<typename T>
void Send(const int fd, const T& bytes)
{
   using ValueType = typename T::value_type;
   using SizeType = typename T::size_type;

   const ValueType *const data{bytes.data()};
   SizeType bytesToSend{bytes.size()};
   SizeType bytesSent{0};
   while (bytesToSend > 0)
   {
      const ValueType *const buf{data + bytesSent};
      const ssize_t retVal{send(fd, buf, bytesToSend, 0)};
      if (retVal < 0)
      {
          throw ch::NetworkError{"Failed to send."};
      }
      const SizeType sent{static_cast<SizeType>(retVal)};
      bytesSent += sent;
      bytesToSend -= sent;
   }
}

This seems to work fine and guarantees that all bytes are sent once the member function returns without throwing an exception.

However, I started running into problems when I began implementing the receive functionality. For my first attempt I used a blocking recv call inside a loop and exited the loop if recv returned 0 indicating that the underlying TCP connection was closed.

template<typename T>
T Receive(const int fd)
{
   using SizeType = typename T::size_type;
   using ValueType = typename T::value_type;

   T result;

   const SizeType bufSize{1024};
   ValueType buf[bufSize];
   while (true)
   {
      const ssize_t retVal{recv(fd, buf, bufSize, 0)};
      if (retVal < 0)
      {
          throw ch::NetworkError{"Failed to receive."};
      }

      if (retVal == 0)
      {
          break; /* Connection is closed. */
      }

      const SizeType offset{static_cast<SizeType>(retVal)};
      result.insert(std::end(result), buf, buf + offset);
   }

   return result;
}

This works fine as long as the connection is closed by the sender after all bytes have been sent. However, this is not the case when using e.g. Chrome to request a webpage. The connection is kept open and my receive member function is stuck blocked on the recv system call after receiving all bytes in the request. I managed to get around this problem by setting a timeout on the recv call using setsockopt. Basically, I return all bytes received so far once the timeout expires. This feels like a very inelegant solution and I do not think that this is the way web servers handles this issue in reality.

So, on to my question.

How does a web server know when an HTTP request have been fully received?

A GET request in HTTP 1.1 does not seem to include a Content-Length header. See e.g. this link.

like image 696
JonatanE Avatar asked Jan 08 '19 14:01

JonatanE


People also ask

How does an HTTP client indicate that it has finished sending the HTTP request?

To actively end a connection, the client sends a packet with the FIN flag, part of a four-way handshake.

How does web server understand HTTP request?

It starts either manually — when you enter an URL in the address bar of your browser — or programatically — by apps, websites (JavaScript), or other programs — and ends when response is received, and between that the magic happens. This is how we typically understand an HTTP request (an oversimplified representation).

How does the web server respond to a request?

When a user wants to navigate to a page, the browser sends an HTTP GET request specifying the URL of its HTML page. The server retrieves the requested document from its file system and returns an HTTP response containing the document and an HTTP Response status code of " 200 OK " (indicating success).

How does browser know which response belongs to which request?

It does this by giving each fragment an identifier to indicate to which request-response pair it belongs, so the receiver can recompose the message.


1 Answers

HTTP/1.1 is a text-based protocol, with binary POST data added in a somewhat hacky way. When writing a "receive loop" for HTTP, you cannot completely separate the data receiving part from the HTTP parsing part. This is because in HTTP, certain characters have special meaning. In particular, the CRLF (0x0D 0x0A) token is used to separate headers, but also to end the request using two CRLF tokens one after the other.

So to stop receiving, you need to keep receiving data until one of the following happens:

  • Timeout – follow by sending a timeout response
  • Two CRLF in the request – follow by parsing the request, then respond as needed (parsed correctly? request makes sense? send data?)
  • Too much data – certain HTTP exploits aim to exhaust server resources like memory or processes (see e.g. slow loris)

And perhaps other edge cases. Also note that this only applies to requests without a body. For POST requests, you first wait for two CRLF tokens, then read Content-Length bytes in addition. And this is even more complicated when the client is using multipart encoding.

like image 113
Aurel Bílý Avatar answered Nov 14 '22 22:11

Aurel Bílý