Is it possible to speed-up python IO?

Question

Consider this python program:

import sys

lc = 0
for line in open(sys.argv[1]):
    lc = lc + 1

print lc, sys.argv[1]

Running it on my 6GB text file, it completes in ~ 2minutes.

Question: is it possible to go faster?

Note that the same time is required by:

wc -l myfile.txt

so, I suspect the anwer to my quesion is just a plain "no".

Note also that my real program is doing something more interesting than just counting the lines, so please give a generic answer, not line-counting-tricks (like keeping a line count metadata in the file)

PS: I tagged "linux" this question, because I'm interested only in linux-specific answers. Feel free to give OS-agnostic, or even other-OS answers, if you have them.

See also the follow-up question

tgray · Accepted Answer

Throw hardware at the problem.

As gs pointed out, your bottleneck is the hard disk transfer rate. So, no you can't use a better algorithm to improve your time, but you can buy a faster hard drive.

Edit: Another good point by gs; you could also use a RAID configuration to improve your speed. This can be done either with hardware or software (e.g. OS X, Linux, Windows Server, etc).

Governing Equation

(Amount to transfer) / (transfer rate) = (time to transfer)

(6000 MB) / (60 MB/s) = 100 seconds

(6000 MB) / (125 MB/s) = 48 seconds

Hardware Solutions

The ioDrive Duo is supposedly the fastest solution for a corporate setting, and "will be available in April 2009".

Or you could check out the WD Velociraptor hard drive (10,000 rpm).

Also, I hear the Seagate Cheetah is a good option (15,000 rpm with sustained 125MB/s transfer rate).

Is it possible to speed-up python IO?

Tags:

performance

python

linux

text-files

Davide

1 Answers

tgray

Recent Activity

Donate For Us

Is it possible to speed-up python IO?

Tags:

performance

python

linux

text-files

Davide

1 Answers

tgray

Related questions

Recent Activity

Donate For Us