Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to correctly parse incoming HTTP requests

i've created an C++ application using WinSck, which has a small (handles just a few features which i need) http server implemented. This is used to communicate with the outside world using http requests. It works, but sometimes the requests are not handled correctly, because the parsing fails. Now i'm quite sure that the requests are correctly formed, since they are sent by major web browsers like firefox/chrome or perl/C# (which have http modules/dll's).

After some debugging i found out that the problem is in fact in receiving the message. When the message comes in more than just one part (it is not read in one recv() call) then sometimes the parsing fails. I have gone through numerous tries on how to resolve this, but nothing seems to be reliable enough.

What i do now is that i read in data until i find "\r\n\r\n" sequence which indicates end of header. If WSAGetLastError() reports something else than 10035 (connection closed/failed) before such a sequence is found i discard the message. When i know i have the whole header i parse it and look for information about the body length. However i'm not sure if this information is mandatory (i think not) and what should i do if there is no such information - does it mean there will be no body? Another problem is that i do not know if i should look for a "\r\n\r\n" after the body (if its length is greater than zero).

Does anybody know how to reliably parse a http message?

Note: i know there are implementations of http servers out there. I want my own for various reasons. And yes, reinventing the wheel is bad, i know that too.

like image 251
PeterK Avatar asked Sep 13 '10 07:09

PeterK


2 Answers

If you're set on writing your own parser, I'd take the Zed Shaw approach: use the Ragel state machine compiler and build your parser based on that. Ragel can handle input arriving in chunks, if you're careful.

Honestly, though, I'd just use something like this.

Your go-to resource should be RFC 2616, which describes HTTP 1.1, which you can use to construct a parser. Good luck!

like image 118
Jack Kelly Avatar answered Nov 01 '22 16:11

Jack Kelly


You could try looking at their code to see how they handle a HTTP message.

Or you could look at the spec, there's message length fields you should use. Only buggy browsers send additional CRLFs at the end, apparently.

like image 3
gbjbaanb Avatar answered Nov 01 '22 15:11

gbjbaanb