As the title says, I'd like to be able to read lines from a file that is inside a zip archive, using zip-conduit (the zip files I'm dealing with are very big, so I need to be able to do this in constant memory). I grok the very basic idea of conduits, but have never used them in anger, and am feeling quite stuck as to where to start. I've read the conduits tutorial, but I'm having trouble matching that up with my problem.
The zip-conduit documentation says one can source from a zip archive via something like the following:
import qualified Data.Conduit.Binary as CB
import Codec.Archive.Zip
withArchive archivePath $ do
name:_ <- entryNames
sourceEntry name $ CB.sinkFile name
I presume what I need to do is write something in place of CB.sinkFile
. Data.Conduit.Text
has a lines
function — could this be used in some way to get the lines out of the file?
I would really appreciate a simple example, say using putStrLn
to write out the lines of a simple text file that is archived inside a zip file. Thanks in advance.
Use zip file.. Will unzip the files so that you can see them.. with zipfile.ZipFile ("../input/"+Dataset+".zip","r") as z: z.extractall (".") it can be easily read from there. i do this. and every thing goes ok. I use this command that I copied path from 'copy file path' bottom: it was ok for me! I hope it solve some problem Similar to my problem.
If ‘infer’ and filepath_or_buffer is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, or ‘.xz’ (otherwise no decompression). If using ‘zip’, the ZIP file must contain only one data file to be read in. Set to None for no decompression. New in version 0.18.1: support for ‘zip’ and ‘xz’ compression.
PowerShell 5 has Expand-Archive, which makes using 7Zip obsolete, but even with Expand-Archiveyou'd have to extract the whole archive to read your file's contents. Using a windows machine, you could accomplish it with the shell.applicationCom Object or system.io.compression.filesystemlike quoted from this thread:
But this heavily depends on your operating system. PowerShell 5 has Expand-Archive, which makes using 7Zip obsolete, but even with Expand-Archiveyou'd have to extract the whole archive to read your file's contents.
Michael's answer but with zip-conduit
:
import Control.Monad.IO.Class (liftIO)
import Data.Conduit
import qualified Data.Conduit.List as CL
import qualified Data.Conduit.Text as CT
import Codec.Archive.Zip
main :: IO ()
main = withArchive "input.zip" $ do
n:_ <- entryNames
sourceEntry n
$ CT.decode CT.utf8
=$ CT.lines
=$ CL.mapM_ (\t -> liftIO $ putStrLn $ "Got a line: " ++ show t)
Here's a simple example:
import Control.Monad.IO.Class (liftIO)
import Data.Conduit
import qualified Data.Conduit.Binary as CB
import qualified Data.Conduit.List as CL
import qualified Data.Conduit.Text as CT
main :: IO ()
main = runResourceT
$ CB.sourceFile "input.txt"
$$ CT.decode CT.utf8
=$ CT.lines
=$ CL.mapM_ (\t -> liftIO $ putStrLn $ "Got a line: " ++ show t)
You can also view and experiment on FP Haskell Center.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With