Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I want to extract a .tgz file and extract any subdirectories that have files that are .tgz and .tar

I'm using the code below to extract .tgz files. The type of log files (.tgz) that I need to extract have sub-directories that have other .tgz files and .tar files inside them. I want to extract those too.

Ultimately, I'm trying to search for certain strings in all .log files and .txt files that may appear in a .tgz file.

Below is the code that I'm using to extract the .tgz file. I've been trying to work out how to extract the sub-files (.tgz and .tar). So far, I've been unsuccessful.

import os, sys, tarfile

try:
    tar = tarfile.open(sys.argv[1] + '.tgz', 'r:gz')
    for item in tar:
        tar.extract(item)
    print 'Done.'
except:
    name = os.path.basename(sys.argv[0])
    print name[:name.rfind('.')], '<filename>'
like image 753
suffa Avatar asked May 19 '11 12:05

suffa


People also ask

How do I extract only certain files from a tar?

Now, if you want a single file or folder from the “tar” file, you need to use the name of the “tar” file and the path to a single file in it. So, we have used the “tar” command with the “-xvf” option, the name of the “tar” file, and the path of a file to be extracted from it as below.


1 Answers

This should give you the desired result:

import os, sys, tarfile

def extract(tar_url, extract_path='.'):
    print tar_url
    tar = tarfile.open(tar_url, 'r')
    for item in tar:
        tar.extract(item, extract_path)
        if item.name.find(".tgz") != -1 or item.name.find(".tar") != -1:
            extract(item.name, "./" + item.name[:item.name.rfind('/')])
try:

    extract(sys.argv[1] + '.tgz')
    print 'Done.'
except:
    name = os.path.basename(sys.argv[0])
    print name[:name.rfind('.')], '<filename>'

As @cularis said this is called recursion.

like image 92
berni Avatar answered Sep 20 '22 17:09

berni