Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mixing ByteString parsing and network IO in Haskell

Tags:

haskell

Background

I'm trying to write a client for a binary network protocol. All network operations are carried out over a single TCP connection, so in that sense the input from the server is a continuous stream of bytes. At the application layer, however, the server conceptually sends a packet on the stream, and the client keeps reading until it knows the packet has been received in its entirety, before sending a response of its own.

A lot of the effort needed to make this work involves parsing and generating binary data, for which I'm using the Data.Serialize module.

The problem

The server sends me a "packet" on the TCP stream. The packet is not necessarily terminated by a newline, nor is it of a predetermined size. It does consist of a predetermined number of fields, and fields generally begin with a 4 byte number describing the length of that field. With some help from Data.Serialize, I already have the code to parse a ByteString version of this packet into a more manageable type.

I'd love to be able to write some code with these properties:

  1. The parsing is only defined once, preferably in my Serialize instance(s). I'd rather not do extra parsing in the IO monad to read the correct number of bytes.
  2. When I try to parse a given packet and not all the bytes have arrived yet, lazy IO will just wait for the extra bytes to arrive.
  3. Conversely, when I try to parse a given packet and all its bytes have arrived IO does not block anymore. That is, I want to read just enough of the stream from the server to parse my type and form a response to send back. If the IO blocks even after enough bytes have arrived to parse my type, then the client and server will become deadlocked, each waiting for more data from the other.
  4. After I send my own response, I can repeat the process by parsing the next type of packet I expect from the server.

So in brief, is it possible to leverage my current ByteString parsing code in combination with lazy IO to read exactly the right number of bytes off the network?

What I've Tried

I tried to use lazy ByteStreams in combination with my Data.Serialize instance, like so:

import Network
import System.IO
import qualified Data.ByteString.Lazy as L
import Data.Serialize

data MyType

instance Serialize MyType

main = withSocketsDo $ do
  h <- connectTo server port
  hSetBuffering h NoBuffering
  inputStream <- L.hGetContents h
  let Right parsed = decodeLazy inputStream :: Either String MyType
  -- Then use parsed to form my own response, then wait for the server reply...

This seems to fail mostly on point 3 above: it stays blocked even after a sufficient number of bytes have arrived to parse MyType. I strongly suspect this is because ByteStrings are read with a given block size at a time, and L.hGetContents is waiting for the rest of this block to arrive. While this property of reading an efficient blocksize is helpful for making efficient reads from disk, it seems to be getting in my way for reading just enough bytes to parse my data.

like image 317
Chris Rice Avatar asked Mar 12 '13 06:03

Chris Rice


1 Answers

Something is wrong with your parser, it is too eager. Most likely it need the next byte after the message for some reason. hGetContents from bytestring doesn't block waiting for the whole chunk. It uses hGetSome internally.

I created simple test case. The server sends "hello" every second:

import Control.Concurrent
import System.IO
import Network

port :: Int
port = 1234

main :: IO ()
main = withSocketsDo $ do
  s <- listenOn $ PortNumber $ fromIntegral port
  (h, _, _) <- accept s

  let loop :: Int -> IO ()
      loop 0 = return ()
      loop i = do
        hPutStr h "hello"
        threadDelay 1000000
        loop $ i - 1
  loop 5

  sClose s

The client reads the whole contents lazily:

import qualified Data.ByteString.Lazy as BSL
import System.IO
import Network

port :: Int
port = 1234

main :: IO ()
main = withSocketsDo $ do
  h <- connectTo "localhost" $ PortNumber $ fromIntegral port
  bs <- BSL.hGetContents h
  BSL.putStrLn bs
  hClose h

If you try to run both of then, you'll see the client printing "hello" every seconds. So, the network subsystem is ok, the issue is somewhere else -- most likely in your parser.

like image 94
Yuras Avatar answered Sep 21 '22 00:09

Yuras