Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Extracting specific files with pattern from tar.gz without extracting the complete file

Tags:

python

regex

tar

I want to extract all files with the pattern *_sl_H* from many tar.gz files, without extracting all files from the archives.

I found these lines, but it is not possible to work with wildcards (https://pymotw.com/2/tarfile/):

import tarfile
import os

os.mkdir('outdir')
t = tarfile.open('example.tar', 'r')
t.extractall('outdir', members=[t.getmember('README.txt')])
print os.listdir('outdir')

Does someone have an idea? Many thanks in advance.

like image 597
asator Avatar asked Nov 30 '22 16:11

asator


1 Answers

Take a look at TarFile.getmembers() method which returns the members of the archive as a list. After you have this list, you can decide with a condition which file is going to be extracted.

import tarfile
import os

os.mkdir('outdir')
t = tarfile.open('example.tar', 'r')
for member in t.getmembers():
    if "_sl_H" in member.name:
        t.extract(member, "outdir")

print os.listdir('outdir')
like image 65
Alexander Avatar answered Dec 05 '22 12:12

Alexander