Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read lines from a file inside a zip archive using Haskell's zip-conduit

As the title says, I'd like to be able to read lines from a file that is inside a zip archive, using zip-conduit (the zip files I'm dealing with are very big, so I need to be able to do this in constant memory). I grok the very basic idea of conduits, but have never used them in anger, and am feeling quite stuck as to where to start. I've read the conduits tutorial, but I'm having trouble matching that up with my problem.

The zip-conduit documentation says one can source from a zip archive via something like the following:

import qualified Data.Conduit.Binary as CB
import Codec.Archive.Zip

withArchive archivePath $ do
    name:_ <- entryNames
    sourceEntry name $ CB.sinkFile name

I presume what I need to do is write something in place of CB.sinkFile. Data.Conduit.Text has a lines function — could this be used in some way to get the lines out of the file?

I would really appreciate a simple example, say using putStrLn to write out the lines of a simple text file that is archived inside a zip file. Thanks in advance.

like image 312
Chris Avatar asked Nov 21 '13 17:11

Chris


People also ask

How do I read a file from a zip file?

Use zip file.. Will unzip the files so that you can see them.. with zipfile.ZipFile ("../input/"+Dataset+".zip","r") as z: z.extractall (".") it can be easily read from there. i do this. and every thing goes ok. I use this command that I copied path from 'copy file path' bottom: it was ok for me! I hope it solve some problem Similar to my problem.

How do I read a zip file with no decompression?

If ‘infer’ and filepath_or_buffer is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, or ‘.xz’ (otherwise no decompression). If using ‘zip’, the ZIP file must contain only one data file to be read in. Set to None for no decompression. New in version 0.18.1: support for ‘zip’ and ‘xz’ compression.

Is there a way to extract the contents of 7zip files?

PowerShell 5 has Expand-Archive, which makes using 7Zip obsolete, but even with Expand-Archiveyou'd have to extract the whole archive to read your file's contents. Using a windows machine, you could accomplish it with the shell.applicationCom Object or system.io.compression.filesystemlike quoted from this thread:

Is it possible to use 7zip to open files in PowerShell?

But this heavily depends on your operating system. PowerShell 5 has Expand-Archive, which makes using 7Zip obsolete, but even with Expand-Archiveyou'd have to extract the whole archive to read your file's contents.


2 Answers

Michael's answer but with zip-conduit:

import           Control.Monad.IO.Class (liftIO)
import           Data.Conduit
import qualified Data.Conduit.List as CL
import qualified Data.Conduit.Text as CT
import           Codec.Archive.Zip

main :: IO ()
main = withArchive "input.zip" $ do
  n:_ <- entryNames
  sourceEntry n
     $ CT.decode CT.utf8
    =$ CT.lines
    =$ CL.mapM_ (\t -> liftIO $ putStrLn $ "Got a line: " ++ show t)
like image 92
tymmym Avatar answered Oct 17 '22 18:10

tymmym


Here's a simple example:

import           Control.Monad.IO.Class (liftIO)
import           Data.Conduit
import qualified Data.Conduit.Binary    as CB
import qualified Data.Conduit.List      as CL
import qualified Data.Conduit.Text      as CT

main :: IO ()
main = runResourceT
     $ CB.sourceFile "input.txt"
    $$ CT.decode CT.utf8
    =$ CT.lines
    =$ CL.mapM_ (\t -> liftIO $ putStrLn $ "Got a line: " ++ show t)

You can also view and experiment on FP Haskell Center.

like image 20
Michael Snoyman Avatar answered Oct 17 '22 18:10

Michael Snoyman