I'm currently writing a very simple web server to learn more about low level socket programming. More specifically, I'm using C++ as my main language and I am trying to encapsulate the low level C system calls inside C++ classes with a more high level API. I have written a <code>Socket</code> class that manages a socket file descriptor and handles opening and closing using RAII. This class also exposes the standard socket operations for a connection oriented socket (TCP) such as bind, listen, accept, connect etc. After reading the man pages for the send and recv system calls I realized that I needed to call these functions inside some form of loop in order to guarantee that all bytes are successfully sent/received. My API for sending and receiving looks similar to this <pre class="prettyprint"><code>void SendBytes(const std::vector<std::uint8_t>& bytes) const; void SendStr(const std::string& str) const; std::vector<std::uint8_t> ReceiveBytes() const; std::string ReceiveStr() const; </code></pre> For the send functionality I decided to use a blocking <code>send</code> call inside a loop such as this (it is an internal helper function that works for both std::string and std::vector). <pre class="prettyprint"><code>template<typename T> void Send(const int fd, const T& bytes) { using ValueType = typename T::value_type; using SizeType = typename T::size_type; const ValueType *const data{bytes.data()}; SizeType bytesToSend{bytes.size()}; SizeType bytesSent{0}; while (bytesToSend > 0) { const ValueType *const buf{data + bytesSent}; const ssize_t retVal{send(fd, buf, bytesToSend, 0)}; if (retVal < 0) { throw ch::NetworkError{"Failed to send."}; } const SizeType sent{static_cast<SizeType>(retVal)}; bytesSent += sent; bytesToSend -= sent; } } </code></pre> This seems to work fine and guarantees that all bytes are sent once the member function returns without throwing an exception. However, I started running into problems when I began implementing the receive functionality. For my first attempt I used a blocking <code>recv</code> call inside a loop and exited the loop if <code>recv</code> returned 0 indicating that the underlying TCP connection was closed. <pre class="prettyprint"><code>template<typename T> T Receive(const int fd) { using SizeType = typename T::size_type; using ValueType = typename T::value_type; T result; const SizeType bufSize{1024}; ValueType buf[bufSize]; while (true) { const ssize_t retVal{recv(fd, buf, bufSize, 0)}; if (retVal < 0) { throw ch::NetworkError{"Failed to receive."}; } if (retVal == 0) { break; /* Connection is closed. */ } const SizeType offset{static_cast<SizeType>(retVal)}; result.insert(std::end(result), buf, buf + offset); } return result; } </code></pre> This works fine as long as the connection is closed by the sender after all bytes have been sent. However, this is not the case when using e.g. Chrome to request a webpage. The connection is kept open and my receive member function is stuck blocked on the <code>recv</code> system call after receiving all bytes in the request. I managed to get around this problem by setting a timeout on the <code>recv</code> call using setsockopt. Basically, I return all bytes received so far once the timeout expires. This feels like a very inelegant solution and I do not think that this is the way web servers handles this issue in reality. So, on to my question. How does a web server know when an HTTP request have been fully received? A <code>GET</code> request in HTTP 1.1 does not seem to include a Content-Length header. See e.g. this link.

HTTP/1.1 is a text-based protocol, with binary POST data added in a somewhat hacky way. When writing a "receive loop" for HTTP, you cannot completely separate the data receiving part from the HTTP parsing part. This is because in HTTP, certain characters have special meaning. In particular, the <code>CRLF</code> (<code>0x0D 0x0A</code>) token is used to separate headers, but also to end the request using two <code>CRLF</code> tokens one after the other. So to stop receiving, you need to keep receiving data until one of the following happens: <ul> <li>Timeout – follow by sending a timeout response</li> <li>Two <code>CRLF</code> in the request – follow by parsing the request, then respond as needed (parsed correctly? request makes sense? send data?)</li> <li>Too much data – certain HTTP exploits aim to exhaust server resources like memory or processes (see e.g. slow loris)</li> </ul> And perhaps other edge cases. Also note that this only applies to requests without a body. For POST requests, you first wait for two <code>CRLF</code> tokens, then read <code>Content-Length</code> bytes in addition. And this is even more complicated when the client is using multipart encoding.

How can a web server know when an HTTP request is fully received?

Tags:

c++

http

sockets

I'm currently writing a very simple web server to learn more about low level socket programming. More specifically, I'm using C++ as my main language and I am trying to encapsulate the low level C system calls inside C++ classes with a more high level API.

I have written a Socket class that manages a socket file descriptor and handles opening and closing using RAII. This class also exposes the standard socket operations for a connection oriented socket (TCP) such as bind, listen, accept, connect etc.

After reading the man pages for the send and recv system calls I realized that I needed to call these functions inside some form of loop in order to guarantee that all bytes are successfully sent/received.

My API for sending and receiving looks similar to this

Click to copy

void SendBytes(const std::vector<std::uint8_t>& bytes) const;
void SendStr(const std::string& str) const;
std::vector<std::uint8_t> ReceiveBytes() const;
std::string ReceiveStr() const;

For the send functionality I decided to use a blocking send call inside a loop such as this (it is an internal helper function that works for both std::string and std::vector).

Click to copy

template<typename T>
void Send(const int fd, const T& bytes)
{
   using ValueType = typename T::value_type;
   using SizeType = typename T::size_type;

   const ValueType *const data{bytes.data()};
   SizeType bytesToSend{bytes.size()};
   SizeType bytesSent{0};
   while (bytesToSend > 0)
   {
      const ValueType *const buf{data + bytesSent};
      const ssize_t retVal{send(fd, buf, bytesToSend, 0)};
      if (retVal < 0)
      {
          throw ch::NetworkError{"Failed to send."};
      }
      const SizeType sent{static_cast<SizeType>(retVal)};
      bytesSent += sent;
      bytesToSend -= sent;
   }
}

This seems to work fine and guarantees that all bytes are sent once the member function returns without throwing an exception.

However, I started running into problems when I began implementing the receive functionality. For my first attempt I used a blocking recv call inside a loop and exited the loop if recv returned 0 indicating that the underlying TCP connection was closed.

Click to copy

template<typename T>
T Receive(const int fd)
{
   using SizeType = typename T::size_type;
   using ValueType = typename T::value_type;

   T result;

   const SizeType bufSize{1024};
   ValueType buf[bufSize];
   while (true)
   {
      const ssize_t retVal{recv(fd, buf, bufSize, 0)};
      if (retVal < 0)
      {
          throw ch::NetworkError{"Failed to receive."};
      }

      if (retVal == 0)
      {
          break; /* Connection is closed. */
      }

      const SizeType offset{static_cast<SizeType>(retVal)};
      result.insert(std::end(result), buf, buf + offset);
   }

   return result;
}

This works fine as long as the connection is closed by the sender after all bytes have been sent. However, this is not the case when using e.g. Chrome to request a webpage. The connection is kept open and my receive member function is stuck blocked on the recv system call after receiving all bytes in the request. I managed to get around this problem by setting a timeout on the recv call using setsockopt. Basically, I return all bytes received so far once the timeout expires. This feels like a very inelegant solution and I do not think that this is the way web servers handles this issue in reality.

So, on to my question.

How does a web server know when an HTTP request have been fully received?

A GET request in HTTP 1.1 does not seem to include a Content-Length header. See e.g. this link.

696

asked Jan 08 '19 14:01

JonatanE

1 Answers

HTTP/1.1 is a text-based protocol, with binary POST data added in a somewhat hacky way. When writing a "receive loop" for HTTP, you cannot completely separate the data receiving part from the HTTP parsing part. This is because in HTTP, certain characters have special meaning. In particular, the CRLF (0x0D 0x0A) token is used to separate headers, but also to end the request using two CRLF tokens one after the other.

So to stop receiving, you need to keep receiving data until one of the following happens:

Timeout – follow by sending a timeout response
Two CRLF in the request – follow by parsing the request, then respond as needed (parsed correctly? request makes sense? send data?)
Too much data – certain HTTP exploits aim to exhaust server resources like memory or processes (see e.g. slow loris)

And perhaps other edge cases. Also note that this only applies to requests without a body. For POST requests, you first wait for two CRLF tokens, then read Content-Length bytes in addition. And this is even more complicated when the client is using multipart encoding.

113

answered Nov 14 '22 22:11

Aurel Bílý

Related questions
                            
                                Howto use a fold expression with a specific type?
                            
                                Qt: How to use Qt's Smartpointers
                            
                                How to capture variable inside lambda
                            
                                What does the "__cplusplus" macro expand to?
                            
                                How to determine endianness at compile-time?
                            
                                What is the most reliable / portable way to allocate memory at low addresses on 64-bit systems?
                            
                                Have a template method but not expose implementation
                            
                                How to Change QMessageBox Icon and Title
                            
                                Are there certain keywords that should not be "#defined" by me?
                            
                                I don't understand why 'Derived1' requires the same amount of memory as 'Derived3'
                            
                                C++: 'cout << pointer << ++pointer' generates a compiler warning
                            
                                size_t ptrdiff_t and address space
                            
                                Converting member function pointers in templates
                            
                                Purpose of using UINT64_C?
                            
                                Usage of std::destroy_at() in c++17?
                            
                                "128-bit floating-point types are not supported in this configuration" error when including any stl library in visual studio linux c++ project
                            
                                GLIBCXX_3.4.26 Not found
                            
                                Simplest implementation of "lightweight type categorization idiom"?
                            
                                Template with auto-type return deduction
                            
                                Is is valid to construct std::set of pointer type?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can a web server know when an HTTP request is fully received?

Tags:

c++

http

sockets

JonatanE

People also ask

1 Answers

Aurel Bílý

Recent Activity

Donate For Us