numpy.loadtxt is way slower than open.....readlines()

Question

when comparing this two ways of doing the same thing:

import numpy as np
import time
start_time = time.time()
for j in range(1000):
    bv=np.loadtxt('file%d.dat' % (j+1))
    if(j%100==0):   
        print bv[300,0] 
T1=time.time() - start_time
print("--- %s seconds ---" % T1)

and

import numpy as np
import time
start_time = time.time()
for j in range(1000):
    a=open('file%d.dat' % (j+1),'r')
    b=a.readlines()
    a.close()
    for i in range(len(b)):
        b[i]=b[i].strip("
")
        b[i]=b[i].split("	")
        b[i]=map(float,b[i])
    bv=np.asarray(b)
    if(j%100==0):   
        print bv[300,0]  
T1=time.time() - start_time
print("--- %s seconds ---" % T1)

I have noticed that the second one is way faster. Is there any way to have something as concise as the first method and as fast as the second one? Why is loadtxt so slow with respect to performing the same task manually?

hpaulj · Accepted Answer

With a simple, not too large csv created with:

In [898]: arr = np.ones((1000,100))
In [899]: np.savetxt('float.csv',arr)

the loadtxt version:

In [900]: timeit data = np.loadtxt('float.csv')
112 ms ± 119 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

fromfile can load text, though it doesn't preserve any shape info (no apparent speed advantage)

In [901]: timeit data = np.fromfile('float.csv', dtype=float, sep=' ').reshape(-1,100)
129 ms ± 1.27 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

the most concise version of the 'manual' that I can come up with:

In [902]: %%timeit
     ...: with open('float.csv') as f:
     ...:     data = np.array([line.strip().split() for line in f],float)
52.9 ms ± 589 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

This 2x improvement over loadtxt seems typical of variations on this.

pd.read_csv is about the same time.

genfromtxt is a bit faster than loadtxt:

In [907]: timeit data = np.genfromtxt('float.csv')
98.2 ms ± 4.7 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

numpy.loadtxt is way slower than open.....readlines()

Tags:

python

numpy

3sm1r

1 Answers

hpaulj

Recent Activity

Donate For Us

numpy.loadtxt is way slower than open.....readlines()

Tags:

python

numpy

3sm1r

1 Answers

hpaulj

Related questions

Recent Activity

Donate For Us