Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Decompressing GZip in Haskell

Tags:

haskell

I'm having a hard time figuring this out. Here's what I'm trying:

ghci> :m +System.FileArchive.GZip  -- From the "MissingH" package
ghci> fmap decompress $ readFile "test.html.gz"
*** Exception: test.html.gz: hGetContents: invalid argument (invalid byte sequence)

Why am I getting that exception?

I also tried Codec.Compression.GZip.decompress from the zlib package, but I can't get the types to work out to String instead of ByteString.

like image 613
Snowball Avatar asked Apr 09 '12 23:04

Snowball


People also ask

Is zlib compatible with gzip?

For applications that require data compression, the functions in this module allow compression and decompression, using the zlib library. The zlib library has its own home page at https://www.zlib.net.

How do I unzip a zlib file?

With the help of zlib. decompress(s) method, we can decompress the compressed bytes of string into original string by using zlib. decompress(s) method. Return : Return decompressed string.

What is zlib H?

The 'zlib' compression library provides in-memory compression and. decompression functions, including integrity checks of the uncompressed data. This version of the library supports only one compression method (deflation) but other algorithms will be added later and will have the same stream. interface.

What is zlib compression level?

The compression ratio of zlib ranges between 2:1 to 5:1. Additionally, it provides ten compression levels, and different levels have different compression ratios and speeds. zlib algorithm uses Deflate method for compression and Inflate method for decompression. Deflate method encodes the data into compressed data.


1 Answers

The conversion from ByteString to String depends on the character encoding of the compressed file, but assuming it's ASCII or Latin-1, this should work:

import Codec.Compression.GZip (decompress)
import qualified Data.ByteString.Lazy as LBS
import Data.ByteString.Lazy.Char8 (unpack)

readGZipFile :: FilePath -> IO String
readGZipFile path = fmap (unpack . decompress) $ LBS.readFile path

If you need to work with some other encoding like UTF-8, replace unpack with an appropriate decoding function, e.g. Data.ByteString.Lazy.UTF8.toString.

Of course, if the file you're decompressing isn't a text file, it's better to keep it as a ByteString.

like image 121
hammar Avatar answered Sep 29 '22 02:09

hammar