Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between Data.ByteString and Data.ByteString.Char8

I read that Char8 only supports ASCII characters and will be dangerous to use if you are using other Unicode characters

{-# LANGUAGE OverloadedStrings #-}

--import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as BC
import qualified Data.Text.IO as TIO
import qualified Data.Text.Encoding as E
import qualified Data.Text as T

name :: T.Text
name = "{ \"name\": \"哈时刻\" }"

nameB :: BC.ByteString
nameB = E.encodeUtf8 name

main :: IO ()
main = do
  BC.writeFile "test.json" nameB
  putStrLn "done"

produces the same result as

{-# LANGUAGE OverloadedStrings #-}

import qualified Data.ByteString as B
--import qualified Data.ByteString.Char8 as BC
import qualified Data.Text.IO as TIO
import qualified Data.Text.Encoding as E
import qualified Data.Text as T

name :: T.Text
name = "{ \"name\": \"哈时刻\" }"

nameB :: B.ByteString
nameB = E.encodeUtf8 name

main :: IO ()
main = do
  B.writeFile "test.json" nameB
  putStrLn "done"

So what is the difference of using Data.ByteString.Char8 vs Data.ByteString

like image 373
laiboonh Avatar asked Nov 23 '17 02:11

laiboonh


Video Answer


1 Answers

If you compare Data.ByteString and Data.ByteString.Char8, you'll notice that a bunch of functions that reference Word8 in the former reference Char in the latter.

-- Data.ByteString
map :: (Word8 -> Word8) -> ByteString -> ByteString
cons :: Word8 -> ByteString -> ByteString
snoc :: ByteString -> Word8 -> ByteString
head :: ByteString -> Word8
uncons :: ByteString -> Maybe (Word8, ByteString) 
{- and so on... -}


-- Data.ByteString.Char8
map :: (Char -> Char) -> ByteString -> ByteString
cons :: Char -> ByteString -> ByteString
snoc :: ByteString -> Char -> ByteString
head :: ByteString -> Char
uncons :: ByteString -> Maybe (Char, ByteString) 
{- and so on... -}

For these functions, and these functions only, Data.ByteString.Char8 is providing the convenience of not have to constantly convert Word8 values into and out of Char ones. writeFile does exactly the same thing in both modules.

Here is a nice way of seeing the different behaviours of similar functions in Text, ByteString, and ByteString.Char8:

{-# LANGUAGE OverloadedStrings #-}

import Data.Text.Encoding

import qualified Data.Text as T
import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as BC

nameText :: T.Text
nameText = "哈时刻"

nameByteString :: B.ByteString
nameByteString = encodeUtf8 nameText

main :: IO ()
main = do
  print $ T.head nameText               -- '\21704'     actual first character
  print $ B.head nameByteString         -- 229          first byte
  print $ BC.head nameByteString        -- '\299'       first byte as character

  putStrLn [ T.head nameText ]          -- 哈           actual first character
  putStrLn [ BC.head nameByteString ]   -- å            first byte as character
like image 152
Alec Avatar answered Sep 23 '22 06:09

Alec