In the program I maintain it is done as in:
# count the files in the archive
length = 0
command = ur'"%s" l -slt "%s"' % (u'path/to/7z.exe', srcFile)
ins, err = Popen(command, stdout=PIPE, stdin=PIPE,
startupinfo=startupinfo).communicate()
ins = StringIO.StringIO(ins)
for line in ins: length += 1
ins.close()
What about error checking ? Would it be enough to modify this to:
proc = Popen(command, stdout=PIPE, stdin=PIPE,
startupinfo=startupinfo)
out = proc.stdout
# ... count
returncode = proc.wait()
if returncode:
raise Exception(u'Failed reading number of files from ' + srcFile)
or should I actually parse the output of Popen ?
EDIT: interested in 7z, rar, zip archives (that are supported by 7z.exe) - but 7z and zip would be enough for starters
To determine how many files there are in the current directory, put in ls -1 | wc -l. This uses wc to do a count of the number of lines (-l) in the output of ls -1.
On the contextual menu, select Properties. Alternatively, select the folder and press the Alt + Enter keys on your keyboard. When the Properties window opens, Windows 10 automatically starts counting the files and folders inside the selected directory.
To count the number of archive members in a zip archive in Python:
#!/usr/bin/env python
import sys
from contextlib import closing
from zipfile import ZipFile
with closing(ZipFile(sys.argv[1])) as archive:
count = len(archive.infolist())
print(count)
It may use zlib
, bz2
, lzma
modules if available, to decompress the archive.
To count the number of regular files in a tar archive:
#!/usr/bin/env python
import sys
import tarfile
with tarfile.open(sys.argv[1]) as archive:
count = sum(1 for member in archive if member.isreg())
print(count)
It may support gzip
, bz2
and lzma
compression depending on version of Python.
You could find a 3rd-party module that would provide a similar functionality for 7z archives.
To get the number of files in an archive using 7z
utility:
import os
import subprocess
def count_files_7z(archive):
s = subprocess.check_output(["7z", "l", archive], env=dict(os.environ, LC_ALL="C"))
return int(re.search(br'(\d+)\s+files,\s+\d+\s+folders$', s).group(1))
Here's version that may use less memory if there are many files in the archive:
import os
import re
from subprocess import Popen, PIPE, CalledProcessError
def count_files_7z(archive):
command = ["7z", "l", archive]
p = Popen(command, stdout=PIPE, bufsize=1, env=dict(os.environ, LC_ALL="C"))
with p.stdout:
for line in p.stdout:
if line.startswith(b'Error:'): # found error
error = line + b"".join(p.stdout)
raise CalledProcessError(p.wait(), command, error)
returncode = p.wait()
assert returncode == 0
return int(re.search(br'(\d+)\s+files,\s+\d+\s+folders', line).group(1))
Example:
import sys
try:
print(count_files_7z(sys.argv[1]))
except CalledProcessError as e:
getattr(sys.stderr, 'buffer', sys.stderr).write(e.output)
sys.exit(e.returncode)
To count the number of lines in the output of a generic subprocess:
from functools import partial
from subprocess import Popen, PIPE, CalledProcessError
p = Popen(command, stdout=PIPE, bufsize=-1)
with p.stdout:
read_chunk = partial(p.stdout.read, 1 << 15)
count = sum(chunk.count(b'\n') for chunk in iter(read_chunk, b''))
if p.wait() != 0:
raise CalledProcessError(p.returncode, command)
print(count)
It supports unlimited output.
Could you explain why buffsize=-1 (as opposed to buffsize=1 in your previous answer: stackoverflow.com/a/30984882/281545)
bufsize=-1
means use the default I/O buffer size instead of bufsize=0
(unbuffered) on Python 2. It is a performance boost on Python 2. It is default on the recent Python 3 versions. You might get a short read (lose data) if on some earlier Python 3 versions where bufsize
is not changed to bufsize=-1
.
This answer reads in chunks and therefore the stream is fully buffered for efficiency. The solution you've linked is line-oriented. bufsize=1
means "line buffered". There is minimal difference from bufsize=-1
otherwise.
and also what the read_chunk = partial(p.stdout.read, 1 << 15) buys us ?
It is equivalent to read_chunk = lambda: p.stdout.read(1<<15)
but provides more introspection in general. It is used to implement wc -l
in Python efficiently.
Since I already have 7z.exe bundled with the app and I surely want to avoid a third party lib, while I do need to parse rar and 7z archives I think I will go with:
regErrMatch = re.compile(u'Error:', re.U).match # needs more testing
r"""7z list command output is of the form:
Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------------------
2015-06-29 21:14:04 ....A <size> <filename>
where ....A is the attribute value for normal files, ....D for directories
"""
reFileMatch = re.compile(ur'(\d|:|-|\s)*\.\.\.\.A', re.U).match
def countFilesInArchive(srcArch, listFilePath=None):
"""Count all regular files in srcArch (or only the subset in
listFilePath)."""
# https://stackoverflow.com/q/31124670/281545
command = ur'"%s" l -scsUTF-8 -sccUTF-8 "%s"' % ('compiled/7z.exe', srcArch)
if listFilePath: command += u' @"%s"' % listFilePath
proc = Popen(command, stdout=PIPE, startupinfo=startupinfo, bufsize=-1)
length, errorLine = 0, []
with proc.stdout as out:
for line in iter(out.readline, b''):
line = unicode(line, 'utf8')
if errorLine or regErrMatch(line):
errorLine.append(line)
elif reFileMatch(line):
length += 1
returncode = proc.wait()
if returncode or errorLine: raise StateError(u'%s: Listing failed\n' +
srcArch + u'7z.exe return value: ' + str(returncode) +
u'\n' + u'\n'.join([x.strip() for x in errorLine if x.strip()]))
return length
Error checking as in Python Popen - wait vs communicate vs CalledProcessError by @JFSebastien
My final(ish) based on accepted answer - unicode may not be needed, kept it for now as I use it everywhere. Also kept regex (which I may expand, I have seen things like re.compile(u'^(Error:.+|.+ Data Error?|Sub items Errors:.+)',re.U)
. Will have to look into check_output and CalledProcessError.
def countFilesInArchive(srcArch, listFilePath=None):
"""Count all regular files in srcArch (or only the subset in
listFilePath)."""
command = [exe7z, u'l', u'-scsUTF-8', u'-sccUTF-8', srcArch]
if listFilePath: command += [u'@%s' % listFilePath]
proc = Popen(command, stdout=PIPE, stdin=PIPE, # stdin needed if listFilePath
startupinfo=startupinfo, bufsize=1)
errorLine = line = u''
with proc.stdout as out:
for line in iter(out.readline, b''): # consider io.TextIOWrapper
line = unicode(line, 'utf8')
if regErrMatch(line):
errorLine = line + u''.join(out)
break
returncode = proc.wait()
msg = u'%s: Listing failed\n' % srcArch.s
if returncode or errorLine:
msg += u'7z.exe return value: ' + str(returncode) + u'\n' + errorLine
elif not line: # should not happen
msg += u'Empty output'
else: msg = u''
if msg: raise StateError(msg) # consider using CalledProcessError
# number of files is reported in the last line - example:
# 3534900 325332 75 files, 29 folders
return int(re.search(ur'(\d+)\s+files,\s+\d+\s+folders', line).group(1))
Will edit this with my findings.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With