I have a basic doubt regarding the 3 families of functions below:
Say for example, I would like to know the current working directory in Python. So some of the ways of achieving this could be :
os.system("pwd")
os.getcwd()
subprocess.Popen(['pwd'], stdout=PIPE, stderr=PIPE)
I found many resources online differentiating the third one above with the rest, but could not find much particularly around the difference between the first 2 above.
Please help me understand the difference among the 3, especially
Any pointers to for my self reference would also be greatly appreciated.
The best way of finding the performance issues is to measure it yourself. Python implementations vary. Equally, if you want to find out how they are implemented then read the source code.
All these mechanisms will probably call the same underlying C library interface getcwd(3)
on UNIX-like systems. Its just that some use an external program to get there.
os.system("pwd")
runs a shell (1st process) to run the external program pwd
(2nd process). So the performance will be poor and it is not portable since pwd
is not implemented on all platforms that support Python. The output goes the stdout
, it is not captured for use in the program.
You omitted the function popen
:
wd = os.popen('pwd').readline()
This is considered obsolete, and so is os.system()
, both being replaced by subprocess
. The popen
implementation might use the C runtime-library function popen
, which also invokes a shell to execute the command, so performance might be poor.
The subprocess
module was meant to replace these old interfaces, hence the duplication. One advantage is that the use of a shell is optional, and in this case you don't need it. But your code is incomplete:
proc = subprocess.Popen(['pwd'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
wd, err = proc.communicate()
if err:
print >> sys.stderr, err
else:
print wd,
That is going to be more efficient because there is no shell process. But really you should benchmark it yourself on your own system.
All the interfaces so far are general - they are for running a large variety of commands and external programs. The fact that they are often abused and used inappropriately is a refection of the lack of understanding (or laziness) of the programmer.
We are left with os.getcwd()
. It is probably implemented as a direct call to the C library interface, but might circumvent that to call the kernel. Check the source code on your implementation. No child processes, so faster.
Your point about duplication could also be made for many other interfaces, os.remove()
, os.mkdir()
, os.stat()
, and so on. The point is that these are fast and portable, they don't run other programs.
Having these direct interfaces is a major advantage of using a language like Python compared to using a shell. Bash has no built-in command for getting the current directory, getting a file size, deleting a file, etc. It relies on external programs - python does not.
The os.getcwd()
call will be faster, as it doesn't rely on any external dependencies. Your first and last example both execute a separate process, the pwd
system command, and return its output as a string. Well, your Popen()
example will require you to get its output using another call, but I digress.
There's simply no need to call an external command; just use os.getcwd()
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With