Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is Haskell/unpack messing with my bytes?

I've built a tiny UDP/protobuf transmitter and receiver. I've spent the morning trying to track down why the protobuf decoding was producing errors, only to find that it was the transmitter (Spoke.hs) which was sending incorrect data.

The code used unpack to turn Lazy.ByteStrings into Strings that the Network package will send. I found unpack in Hoogle. It may not be the function I'm looking for, but its description looks suitable: "O(n) Converts a ByteString to a String."

Spoke.hs produces the following output:

chris@gigabyte:~/Dropbox/haskell-workspace/hub/dist/build/spoke$ ./spoke
45
45
["a","8","4a","6f","68","6e","20","44","6f","65","10","d2","9","1a","10","6a","64","6f","65","40","65","78","61","6d","70","6c","65","2e","63","6f","6d","22","c","a","8","35","35","35","2d","34","33","32","31","10","1"]

While wireshark shows me that the data in the packet is:

0a:08:4a:6f:68:6e:20:44:6f:65:10:c3:92:09:1a:10:6a:64:6f:65:40:65:78:61:6d:70:6c:65:2e:63:6f:6d:22:0c:0a:08:35:35:35:2d:34:33:32:31:10

The length (45) is the same from Spoke.hs and Wireshark.

Wireshark is missing the last byte (value Ox01) and a stream of central values is different (and one byte larger in Wireshark).

"65","10","d2","9" in Spoke.hs vs 65:10:c3:92:09 in Wireshark.

As 0x10 is DLE, it struck me that there's probably some escaping going on, but I don't know why.

I have many years of trust in Wireshark and only a few tens of hours of Haskell experience, so I've assumed that it's the code that's at fault.

Any suggestions appreciated.

-- Spoke.hs:

module Main where

import Data.Bits
import Network.Socket -- hiding (send, sendTo, recv, recvFrom)
-- import Network.Socket.ByteString
import Network.BSD
import Data.List
import qualified Data.ByteString.Lazy.Char8 as B
import Text.ProtocolBuffers.Header (defaultValue, uFromString)
import Text.ProtocolBuffers.WireMessage (messageGet, messagePut)
import Data.Char (ord, intToDigit)
import Numeric

import Data.Sequence ((><), fromList)

import AddressBookProtos.AddressBook
import AddressBookProtos.Person
import AddressBookProtos.Person.PhoneNumber
import AddressBookProtos.Person.PhoneType

data UDPHandle = 
     UDPHandle {udpSocket  :: Socket,
                udpAddress :: SockAddr}
opensocket :: HostName             -- ^ Remote hostname, or localhost
           -> String               -- ^ Port number or name
           -> IO UDPHandle         -- ^ Handle to use for logging
opensocket hostname port =
    do -- Look up the hostname and port.  Either raises an exception
       -- or returns a nonempty list.  First element in that list
       -- is supposed to be the best option.
       addrinfos <- getAddrInfo Nothing (Just hostname) (Just port)
       let serveraddr = head addrinfos

       -- Establish a socket for communication
       sock <- socket (addrFamily serveraddr) Datagram defaultProtocol

       -- Save off the socket, and server address in a handle
       return $ UDPHandle sock (addrAddress serveraddr)

john = Person {
  AddressBookProtos.Person.id = 1234,
  name = uFromString "John Doe",
  email = Just $ uFromString "[email protected]",
  phone = fromList [
    PhoneNumber {
      number = uFromString "555-4321",
      type' = Just HOME
    }
  ]
}

johnStr = B.unpack (messagePut john)

charToHex x = showIntAtBase 16 intToDigit (ord x) ""

main::IO()
main = 
    do udpHandle <- opensocket "localhost" "4567"
       sent <- sendTo (udpSocket udpHandle) johnStr (udpAddress udpHandle)
       putStrLn $ show $ length johnStr
       putStrLn $ show sent
       putStrLn $ show $ map charToHex johnStr
       return ()
like image 834
fadedbee Avatar asked Jun 15 '12 12:06

fadedbee


2 Answers

The documentation I see for the bytestring package lists unpack as converting a ByteString to [Word8], which is not the same as a String. I would expect some byte difference between ByteString and String because String is Unicode data while ByteString is just an efficient array of bytes, but unpack shouldn't be able to produce a String in the first place.

So you're probably falling foul of Unicode conversion here, or at least something's interpreting it as Unicode when the underlying data really isn't and that seldom ends well.

like image 166
Matthew Walton Avatar answered Sep 19 '22 17:09

Matthew Walton


I think you'll want toString and fromString from utf8-string instead of unpack and pack. This blog post was very helpful for me.

like image 40
Beerend Lauwers Avatar answered Sep 22 '22 17:09

Beerend Lauwers