Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to speed-up python IO?

Consider this python program:

import sys

lc = 0
for line in open(sys.argv[1]):
    lc = lc + 1

print lc, sys.argv[1]

Running it on my 6GB text file, it completes in ~ 2minutes.

Question: is it possible to go faster?

Note that the same time is required by:

wc -l myfile.txt

so, I suspect the anwer to my quesion is just a plain "no".

Note also that my real program is doing something more interesting than just counting the lines, so please give a generic answer, not line-counting-tricks (like keeping a line count metadata in the file)

PS: I tagged "linux" this question, because I'm interested only in linux-specific answers. Feel free to give OS-agnostic, or even other-OS answers, if you have them.

See also the follow-up question

like image 847
Davide Avatar asked May 11 '09 17:05

Davide


1 Answers

Throw hardware at the problem.

As gs pointed out, your bottleneck is the hard disk transfer rate. So, no you can't use a better algorithm to improve your time, but you can buy a faster hard drive.

Edit: Another good point by gs; you could also use a RAID configuration to improve your speed. This can be done either with hardware or software (e.g. OS X, Linux, Windows Server, etc).


Governing Equation

(Amount to transfer) / (transfer rate) = (time to transfer)

(6000 MB) / (60 MB/s) = 100 seconds

(6000 MB) / (125 MB/s) = 48 seconds


Hardware Solutions

The ioDrive Duo is supposedly the fastest solution for a corporate setting, and "will be available in April 2009".

Or you could check out the WD Velociraptor hard drive (10,000 rpm).

Also, I hear the Seagate Cheetah is a good option (15,000 rpm with sustained 125MB/s transfer rate).

like image 91
tgray Avatar answered Sep 25 '22 11:09

tgray