Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace newlines in ByteString

I'd like a function that takes a ByteString and replaces newlines \n and \n\r with commas, but can't think of a nice way to do it.

import qualified Data.ByteString as BS
import Data.Char (ord) 
import Data.Word (Word8)

endlWord8 = fromIntegral $ ord '\n' :: Word8

replace :: BS.ByteString -> BS.ByteString

I thought of using BS.map but can't see how since I can't pattern match on Word8's. Another option would be BS.split and then join with Word8 commas, but that sounds slow and inelegant. Any ideas?

like image 797
jorgen Avatar asked Oct 28 '22 23:10

jorgen


1 Answers

Use Data.ByteString.Char8 to get rid of the nasty Word8, Char conversions you otherwise have to do. According to Data.ByteString.Char8 first sentence performance shouldn't be altered.

Additionally use B.span instead of B.split as you want to replace also \n\r combinations and not only \n.

My own (probably clumsy) attempt to do so:

module Test where

import Data.Monoid ((<>))
import Data.ByteString.Char8 (ByteString)
import qualified Data.ByteString.Char8 as B
import qualified Data.ByteString.Builder as Build
import qualified Data.ByteString.Lazy as LB

eatNewline :: ByteString -> (Maybe Char, ByteString)
eatNewline string
  | B.null string = (Nothing, string)
  | B.head string == '\n' && B.null (B.tail string) = (Just ',', B.empty)
  | B.head string == '\n' && B.head (B.tail string) /= '\r' = (Just ',', B.drop 1 string)
  | B.head string == '\n' && B.head (B.tail string) == '\r' = (Just ',', B.drop 2 string)
  | otherwise = (Nothing, string)

replaceNewlines :: ByteString -> ByteString
replaceNewlines = LB.toStrict . Build.toLazyByteString . go mempty
  where
    go :: Build.Builder -> ByteString -> Build.Builder
    go builder string = let (chunk, rest) = B.span (/= '\n') string
                            (c, rest1)    = eatNewline rest
                            maybeComma    = maybe mempty Build.char8 c
                        in if B.null rest1 then
                             builder <> Build.byteString chunk <> maybeComma
                           else
                             go (builder <> Build.byteString chunk <> maybeComma) rest1

Hopefully the mappend for Data.ByteString.Builder isn't linear in the number of times mappend was already used for one of its operands, otherwise, there would be a quadratic alogrithm here.

like image 65
typetetris Avatar answered Nov 15 '22 09:11

typetetris