I'm porting a bash script to python 2.6, and want to replace some code:
cat $( ls -tr xyz_`date +%F`_*.log ) | filter args > bzip2
I guess I want something similar to the "Replacing shell pipe line" example at http://docs.python.org/release/2.6/library/subprocess.html, ala...
p1 = Popen(["filter", "args"], stdin=*?WHAT?*, stdout=PIPE)
p2 = Popen(["bzip2"], stdin=p1.stdout, stdout=PIPE)
output = p2.communicate()[0]
But, I'm not sure how best to provide p1
's stdin
value so it concatenates the input files. Seems I could add...
p0 = Popen(["cat", "file1", "file2"...], stdout=PIPE)
p1 = ... stdin=p0.stdout ...
...but that seems to be crossing beyond use of (slow, inefficient) pipes to call external programs with significant functionality. (Any decent shell performs the cat
internally.)
So, I can imagine a custom class that satisfies the file object API requirements and can therefore be used for p1's stdin, concatenating arbitrary other file objects. (EDIT: existing answers explain why this isn't possible)
Does python 2.6 have a mechanism addressing this need/want, or might another Popen
to cat
be considered perfectly fine in python circles?
Thanks.
If successful, popen() returns a pointer to an open stream that can be used to read or write to a pipe. If unsuccessful, popen() returns a NULL pointer and sets errno to one of the following values: Error Code. Description.
Popen do we need to close the connection or subprocess automatically closes the connection? Usually, the examples in the official documentation are complete. There the connection is not closed. So you do not need to close most probably.
The subprocess module defines one class, Popen and a few wrapper functions that use that class. The constructor for Popen takes arguments to set up the new process so the parent can communicate with it via pipes. It provides all of the functionality of the other modules and functions it replaces, and more.
Python method popen() opens a pipe to or from command. The return value is an open file object connected to the pipe, which can be read or written depending on whether mode is 'r' (default) or 'w'.
You can replace everything that you're doing with Python code, except for your external utility. That way your program will remain portable as long as your external util is portable. You can also consider turning the C++ program into a library and using Cython to interface with it. As Messa showed, date
is replaced with time.strftime
, globbing is done with glob.glob
and cat
can be replaced with reading all the files in the list and writing them to the input of your program. The call to bzip2
can be replaced with the bz2
module, but that will complicate your program because you'd have to read and write simultaneously. To do that, you need to either use p.communicate
or a thread if the data is huge (select.select
would be a better choice but it won't work on Windows).
import sys
import bz2
import glob
import time
import threading
import subprocess
output_filename = '../whatever.bz2'
input_filenames = glob.glob(time.strftime("xyz_%F_*.log"))
p = subprocess.Popen(['filter', 'args'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
output = open(output_filename, 'wb')
output_compressor = bz2.BZ2Compressor()
def data_reader():
for filename in input_filenames:
f = open(filename, 'rb')
p.stdin.writelines(iter(lambda: f.read(8192), ''))
p.stdin.close()
input_thread = threading.Thread(target=data_reader)
input_thread.start()
with output:
for chunk in iter(lambda: p.stdout.read(8192), ''):
output.write(output_compressor.compress(chunk))
output.write(output_compressor.flush())
input_thread.join()
p.wait()
You can use either the file extension or the Python bindings for libmagic to detect how the file is compressed. Here's a code example that does both, and automatically chooses magic
if it is available. You can take the part that suits your needs and adapt it to your needs. The open_autodecompress
should detect the mime encoding and open the file with the appropriate decompressor if it is available.
import os
import gzip
import bz2
try:
import magic
except ImportError:
has_magic = False
else:
has_magic = True
mime_openers = {
'application/x-bzip2': bz2.BZ2File,
'application/x-gzip': gzip.GzipFile,
}
ext_openers = {
'.bz2': bz2.BZ2File,
'.gz': gzip.GzipFile,
}
def open_autodecompress(filename, mode='r'):
if has_magic:
ms = magic.open(magic.MAGIC_MIME_TYPE)
ms.load()
mimetype = ms.file(filename)
opener = mime_openers.get(mimetype, open)
else:
basepart, ext = os.path.splitext(filename)
opener = ext_openers.get(ext, open)
return opener(filename, mode)
If you look inside the subprocess
module implementation, you will see that std{in,out,err} are expected to be fileobjects supporting fileno()
method, so a simple concatinating file-like object with python interface (or even a StringIO object) is not suitable here.
If it were iterators, not file objects, you could use itertools.chain
.
Of course, sacrificing the memory consumption you can do something like this:
import itertools, os
# ...
files = [f for f in os.listdir(".") if os.path.isfile(f)]
input = ''.join(itertools.chain(open(file) for file in files))
p2.communicate(input)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With