I have a large tar.gz file to analyze using a python script. The tar.gz file contains a number of zip files which might embed other .gz files in it. Before extracting the file, I would like to walk through the directory structure within the compressed files to see if certain files or directories are present. By looking at tarfile and zipfile module I don't see any existing function that allow me to get a table of content of a zip file within a tar.gz file.
Appreciate your help,
If you are using Windows 7, 8 or 10, follow the following steps to open any zip files without WinZip or WinRAR. Double click the zip file you wish to extract to open the file explorer. At the top part of the explorer menu, find “Compressed folder tools” and click it. Select the “extract” option that appears below it.
On the other hand, the zip format is an archiver, as well as a compressor. Choose tar if you need to archive files. Choose the zip you need to archive and compress files. You can choose tar format if you are working on a Linux system.
You can't get at it without extracting the file. However, you don't need to extract it to disk if you don't want to. You can use the tarfile.TarFile.extractfile
method to get a file-like object that you can then pass to tarfile.open
as the fileobj
argument. For example, given these nested tarfiles:
$ cat bar/baz.txt
This is bar/baz.txt.
$ tar cvfz bar.tgz bar
bar/
bar/baz.txt
$ tar cvfz baz.tgz bar.tgz
bar.tgz
You can access files from the inner one like so:
>>> import tarfile
>>> baz = tarfile.open('baz.tgz')
>>> bar = tarfile.open(fileobj=baz.extractfile('bar.tgz'))
>>> bar.extractfile('bar/baz.txt').read()
'This is bar/baz.txt.\n'
and they're only ever extracted to memory.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With