Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

reading tar file contents without untarring it, in python script

Tags:

python

tar

I have a tar file which has number of files within it. I need to write a python script which will read the contents of the files and gives the count o total characters, including total number of letters, spaces, newline characters, everything, without untarring the tar file.

like image 666
randeepsp Avatar asked Jan 07 '10 05:01

randeepsp


People also ask

How can I view the contents of a tar file without extracting it?

Use -t switch with tar command to list content of a archive. tar file without actually extracting. You can see that output is pretty similar to the result of ls -l command.

How do I read a tar file in Python?

You can use the tarfile module to read and write tar files. To extract a tar file, you need to first open the file and then use the extract method of the tarfile module.


2 Answers

you can use getmembers()

>>> import  tarfile >>> tar = tarfile.open("test.tar") >>> tar.getmembers() 

After that, you can use extractfile() to extract the members as file object. Just an example

import tarfile,os import sys os.chdir("/tmp/foo") tar = tarfile.open("test.tar") for member in tar.getmembers():     f=tar.extractfile(member)     content=f.read()     print "%s has %d newlines" %(member, content.count("\n"))     print "%s has %d spaces" % (member,content.count(" "))     print "%s has %d characters" % (member, len(content))     sys.exit() tar.close() 

With the file object f in the above example, you can use read(), readlines() etc.

like image 139
ghostdog74 Avatar answered Oct 16 '22 03:10

ghostdog74


you need to use the tarfile module. Specifically, you use an instance of the class TarFile to access the file, and then access the names with TarFile.getnames()

 |  getnames(self)
 |      Return the members of the archive as a list of their names. It has
 |      the same order as the list returned by getmembers().

If instead you want to read the content, then you use this method

 |  extractfile(self, member)
 |      Extract a member from the archive as a file object. `member' may be
 |      a filename or a TarInfo object. If `member' is a regular file, a
 |      file-like object is returned. If `member' is a link, a file-like
 |      object is constructed from the link's target. If `member' is none of
 |      the above, None is returned.
 |      The file-like object is read-only and provides the following
 |      methods: read(), readline(), readlines(), seek() and tell()
like image 42
Stefano Borini Avatar answered Oct 16 '22 02:10

Stefano Borini