Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

reading files from tar.gz archive in Nim

Looking for a way to read in a file from a tar.gz archive using the Nim programming language (version 0.11.2). Say I have an archive

/my/path/to/archive.tar.gz

and a file in that archive

my/path/to/archive/file.txt

My goal is to be able to read the contents of the file line by line in Nim. In Python I can do this with the tarfile module. In Nim there are the libzip and zlib modules, but the documentation is minimal and there are no examples. There's also the zipfiles module, but I'm not sure if this is capable of working with tar.gz archives.

like image 696
COM Avatar asked Oct 12 '15 13:10

COM


3 Answers

In a project at my company, we've been using the following module, exposing gzip files as streams:

import
  zlib, streams

type
  GZipStream* = object of StreamObj
    f: GzFile

  GzipStreamRef* = ref GZipStream

proc fsClose(s: Stream) =
  discard gzclose(GZipStreamRef(s).f)

proc fsReadData(s: Stream, buffer: pointer, bufLen: int): int =
  return gzread(GZipStreamRef(s).f, buffer, bufLen)

proc fsAtEnd(s: Stream): bool =
  return gzeof(GZipStreamRef(s).f) != 0

proc newGZipStream*(f: GzFile): GZipStreamRef =
  new result
  result.f = f
  result.closeImpl = fsClose
  result.readDataImpl = fsReadData
  result.atEndImpl = fsAtEnd
  # other methods are nil!

proc newGZipStream*(filename: cstring): GZipStreamRef =
  var gz = gzopen(filename, "r")
  if gz != nil: return newGZipStream(gz)

But you also need to to be able to read the tar header in order to find the correct location of the desired file in the uncompressed gzip stream. You could wrap some existing C library like libtar to do this, or you could roll your own implementation.

like image 179
zah Avatar answered Oct 21 '22 01:10

zah


To my knowledge, libzip and zlib cannot be used to read tar files (afaik they only support zip archives and/or raw string compression, while a tar.gz requires gzip + tar). Unfortunately it looks like there are no Nim libraries yet which read tar.gz archives.

If you are okay with a quick-and-dirty tar-based solution, you can do this:

import osproc

proc extractFromTarGz(archive: string, filename: string): string =
  # -z extracts
  # -f specifies filename
  # -z runs through gzip
  # -O prints to STDOUT
  result = execProcess("tar -zxf " & archive & " " & filename & " -O")

let content = extractFromTarGz("test.tar.gz", "some/subpath.txt")

If you want a clean and flexible solution, this would be a good opportunity to write a wrapper for the libarchive library ;).

like image 2
bluenote10 Avatar answered Oct 21 '22 01:10

bluenote10


I created a basic untar package that may help with this: https://github.com/dom96/untar

like image 2
dom96 Avatar answered Oct 21 '22 02:10

dom96