Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

With Python's 'tarfile', how can I get the top-most directory in a tar archive?

Tags:

python

tar

I am wanting to upload a theme archive to a django web module and wanting to pull the name of the top-most directory in the archive to use as the theme's name. The archive will always be a tar-gzip format and will always have only one folder at the top level (though other files may exist parallel to it) with the various sub-directories containing templates, css, images etc. in what ever order suits the theme best.

Currently, based on the very useful code from MegaMark16, my tool uses the following method:

f = tarfile.open(fileobj=self.theme_file, mode='r:gz')
self.name = f.getnames()[0]

Where self.theme_file is a full path to the uploaded file. This works fine as long as the order of the entries in the tarball happens to be correct, but in many cases it is not. I can certainly loop through the entire archive and manually check for the proper 'name' characteristics, but I suspect that there is a more elegant and rapid approach. Any suggestions?

like image 648
The NetYeti Avatar asked Jun 29 '12 21:06

The NetYeti


People also ask

How do I read a tar file in Python?

You can use the tarfile module to read and write tar files. To extract a tar file, you need to first open the file and then use the extract method of the tarfile module.

What is a tarball in Python?

PythonProgrammingServer Side Programming. The 'tar' utility was originally introduced for UNIX operating system. Its purpose is to collect multiple files in a single archive file often called tarball which makes it easy to distribute the files.

How do I tar multiple files in Python?

Use the tarfile module to create a zip archive of a directory. Walk the directory tree using os. walk and add all the files in it recursively.


1 Answers

You'll want to use a method called commonprefix.

Sample code would be something to the effect of:

archive = tarfile.open(filepath, mode='r')
print os.path.commonprefix(archive.getnames())

Where the printed value would be the 'topmost directory in the archive'--or, your theme name.

Edit: upon further reading of your specs, though, this approach may not yield your desired result if you have files that are siblings to the 'topmost directory', as the common prefix would then just be .; this would only work if ALL files, indeed, had that common prefix of your theme name.

like image 162
hexparrot Avatar answered Sep 20 '22 12:09

hexparrot