Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling nested zip files with archive/zip

Tags:

zip

go

I'm struggling to handle nested zip files in Go (where a zip file contains another zip file). I'm trying to recurse a zip file and list all of the files it contains.

archive/zip gives you two methods for handling a zip file:

  • zip.NewReader
  • zip.OpenReader

OpenReader opens a file on disk. NewReader accepts an io.ReaderAt and a file size. As you iterate through the zipped files with either of these, you get out a zip.File for each file inside the zip. To get the file contents of file f, you call f.Open which gives you a zip.ReadCloser. To open a nested zip file, I'd need to use NewReader, but zip.File and zip.ReadCloser do not satisfy the io.ReaderAt interface.

zip.File has a private field zipr which is an io.ReaderAt and zip.ReadCloser has a private field f which is an os.File which should satisfy the requirements for NewReader.

My question: is there any way to open a nested zip file without first writing the contents to a file on disk, or reading the whole thing into memory.

It looks like everything that is needed is available in zip.File, but isn't exported. I'm hoping I missed something.

like image 481
freb Avatar asked Feb 13 '26 03:02

freb


1 Answers

How about an io.ReaderAt from an io.Reader that reinitializes if you decided to go backwards: (this code is largely untested, but hopefully you get the idea)

package main

import (
    "io"
    "io/ioutil"
    "os"
    "strings"
)

type inefficientReaderAt struct {
    rdr    io.ReadCloser
    cur    int64
    initer func() (io.ReadCloser, error)
}

func newInefficentReaderAt(initer func() (io.ReadCloser, error)) *inefficientReaderAt {
    return &inefficientReaderAt{
        initer: initer,
    }
}

func (r *inefficientReaderAt) Read(p []byte) (n int, err error) {
    n, err = r.rdr.Read(p)
    r.cur += int64(n)
    return n, err
}

func (r *inefficientReaderAt) ReadAt(p []byte, off int64) (n int, err error) {
    // reset on rewind
    if off < r.cur || r.rdr == nil {
        r.cur = 0
        r.rdr, err = r.initer()
        if err != nil {
            return 0, err
        }
    }

    if off > r.cur {
        sz, err := io.CopyN(ioutil.Discard, r.rdr, off-r.cur)
        n = int(sz)
        if err != nil {
            return n, err
        }
    }

    return r.Read(p)
}

func main() {
    r := newInefficentReaderAt(func() (io.ReadCloser, error) {
        return ioutil.NopCloser(strings.NewReader("ABCDEFG")), nil
    })

    io.Copy(os.Stdout, io.NewSectionReader(r, 0, 3))
    io.Copy(os.Stdout, io.NewSectionReader(r, 1, 3))
}

If you mostly move forwards this probably works ok. Especially if you use a buffered reader.

  • I should note that this violates the io.ReaderAt guarantees: https://godoc.org/io#ReaderFrom , namely it doesn't allow parallel calls to ReadAt, and doesn't block on full reads, so this may not even work properly
like image 99
Caleb Avatar answered Feb 14 '26 15:02

Caleb



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!