I've got a program that reads in 3 strings per line for 50000. It then does other things. The part that reads the file and converts to integers is taking 80% of the total running time.
My code snippet is below:
import time
file = open ('E:/temp/edges_big.txt').readlines()
start_time = time.time()
for line in file[1:]:
label1, label2, edge = line.strip().split()
# label1 = int(label1); label2 = int(label2); edge = float(edge)
# Rest of the loop deleted
print ('processing file took ', time.time() - start_time, "seconds")
The above takes about 0.84 seconds. Now, when I uncomment the line
label1 = int(label1);label2 = int(label2);edge = float(edge)
the runtime rises to about 3.42 seconds.
The input file is in the form: str1 str2 str3
per line
Are the functions int()
and float()
that slow? How could I optimize this?
Use the parseInt() function to convert a string to a number, e.g. const num1 = parseInt(str) .
Use Integer.parseInt() to Convert a String to an Integer This method returns the string as a primitive type int. If the string does not contain a valid integer then it will throw a NumberFormatException.
Atoi is the fastest I could come up with. I compiled with msvc 2010 so it might be possible to combine both templates. In msvc 2010, when I combined templates it made the case where you provide a cb argument slower.
While converting from string to int you may get ValueError exception. This exception occurs if the string you want to convert does not represent any numbers. Suppose, you want to convert a hexadecimal number to an integer. But you did not pass argument base=16 in the int() function.
If the file is in OS cache then parsing the file takes milliseconds on my machine:
name time ratio comment
read_read 145 usec 1.00 big.txt
read_readtxt 2.07 msec 14.29 big.txt
read_readlines 4.94 msec 34.11 big.txt
read_james_otigo 29.3 msec 201.88 big.txt
read_james_otigo_with_int_float 82.9 msec 571.70 big.txt
read_map_local 93.1 msec 642.23 big.txt
read_map 95.6 msec 659.57 big.txt
read_numpy_loadtxt 321 msec 2213.66 big.txt
Where the read_*()
functions are:
def read_read(filename):
with open(filename, 'rb') as file:
data = file.read()
def read_readtxt(filename):
with open(filename, 'rU') as file:
text = file.read()
def read_readlines(filename):
with open(filename, 'rU') as file:
lines = file.readlines()
def read_james_otigo(filename):
file = open (filename).readlines()
for line in file[1:]:
label1, label2, edge = line.strip().split()
def read_james_otigo_with_int_float(filename):
file = open (filename).readlines()
for line in file[1:]:
label1, label2, edge = line.strip().split()
label1 = int(label1); label2 = int(label2); edge = float(edge)
def read_map(filename):
with open(filename) as file:
L = [(int(l1), int(l2), float(edge))
for line in file
for l1, l2, edge in [line.split()] if line.strip()]
def read_map_local(filename, _i=int, _f=float):
with open(filename) as file:
L = [(_i(l1), _i(l2), _f(edge))
for line in file
for l1, l2, edge in [line.split()] if line.strip()]
import numpy as np
def read_numpy_loadtxt(filename):
a = np.loadtxt('big.txt', dtype=[('label1', 'i'),
('label2', 'i'),
('edge', 'f')])
And big.txt
is generated using:
#!/usr/bin/env python
import numpy as np
n = 50000
a = np.random.random_integers(low=0, high=1<<10, size=2*n).reshape(-1, 2)
np.savetxt('big.txt', np.c_[a, np.random.rand(n)], fmt='%i %i %s')
It produces 50000 lines:
150 952 0.355243621018
582 98 0.227592557278
478 409 0.546382780254
46 879 0.177980983303
...
To reproduce results, download the code and run:
# write big.txt
python generate-file.py
# run benchmark
python read-array.py
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With