I'm using the code below to extract .tgz
files. The type of log files (.tgz
) that I need to extract have sub-directories that have other .tgz
files and .tar
files inside them. I want to extract those too.
Ultimately, I'm trying to search for certain strings in all .log
files and .txt
files that may appear in a .tgz
file.
Below is the code that I'm using to extract the .tgz
file. I've been trying to work out how to extract the sub-files (.tgz
and .tar
). So far, I've been unsuccessful.
import os, sys, tarfile
try:
tar = tarfile.open(sys.argv[1] + '.tgz', 'r:gz')
for item in tar:
tar.extract(item)
print 'Done.'
except:
name = os.path.basename(sys.argv[0])
print name[:name.rfind('.')], '<filename>'
Now, if you want a single file or folder from the “tar” file, you need to use the name of the “tar” file and the path to a single file in it. So, we have used the “tar” command with the “-xvf” option, the name of the “tar” file, and the path of a file to be extracted from it as below.
This should give you the desired result:
import os, sys, tarfile
def extract(tar_url, extract_path='.'):
print tar_url
tar = tarfile.open(tar_url, 'r')
for item in tar:
tar.extract(item, extract_path)
if item.name.find(".tgz") != -1 or item.name.find(".tar") != -1:
extract(item.name, "./" + item.name[:item.name.rfind('/')])
try:
extract(sys.argv[1] + '.tgz')
print 'Done.'
except:
name = os.path.basename(sys.argv[0])
print name[:name.rfind('.')], '<filename>'
As @cularis said this is called recursion.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With