Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Web server - how to parse requests? Asynchronous Stream Tokenizer?

I'm attempting to create a simple webserver in C# in asynchronous socket programming style. The purpose is very narrow - a Comet server (http long-polling).

I've got the windows service running, accepting connections, dumping request info to the Console and returning simple fixed content to the client.

Now, I can't figure out a manageable strategy for parsing the request data asynchronously and safely. I've written synchronous LL1 parsers before. I'm not sure if LL1 Parser is appropriate or necessary for HTTP. I don't know how to tokenize the input stream asynchronously. All I can think of is having an input buffer per client, reading into that, then copying that to a StringBuilder and periodically checking to see if I have a complete request. But that seems inefficient and might led to difficult to debug/maintain code.

Also, there are the two phases of the connection of receiving the request in full and the sending a response - in this case, after some delay. Once the request is validated and actionable, only then am I planning to enroll the connection in the long-polling manager. However, a misbehaving client could continue to send data and fill up a buffer, so I think I need to continue to monitor and empty the input stream during the response phase, right?

Any guidance on this is appreciated.

I guess the first step is knowing whether it is possible to efficiently tokenize a network stream asynchronously and without a large intermediate buffer. Even without a proper parser, the same challenges of creating a tokenizer apply to reading "lines" of input at a time, or even reading until double blank lines (one big token). I don't want to read one byte at a time from the network, but neither do I want to read too many bytes and have to store them in some intermediate buffer, right?

like image 276
Jason Kleban Avatar asked Nov 05 '22 02:11

Jason Kleban


1 Answers

For HTTP the best way is reading the headers in memory completely (until you receive \r\n\r\n) and then simply splitting by \r\n to get the headers and every header by : to separate name and value.

There's no need to use a complex parser for that.

like image 117
ThiefMaster Avatar answered Nov 10 '22 16:11

ThiefMaster