I'd like to read all integers from a file into the one list. All numbers are separated by space (one or more) or end line character (one or more). What is the most efficient and/or elegant way of doing this? I have two solutions, but I don't know if they are good or not.
Checking for digits:
for line in open("foo.txt", "r"):
for i in line.strip().split(' '):
if i.isdigit():
my_list.append(int(i))
Dealing with exceptions:
for line in open("foo.txt", "r"):
for i in line:
try:
my_list.append(int(i))
except ValueError:
pass
Sample data:
1 2 3
4 56
789
9 91 56
10
11
An efficient way of doing it would be your first method with a small change of using with
statement for opening the file , Example -
with open("foo.txt", "r") as f:
for line in f:
for i in line.split():
if i.isdigit():
my_list.append(int(i))
Timing tests done with comparisons to other methods -
The functions -
def func1():
my_list = []
for line in open("foo.txt", "r"):
for i in line.strip().split(' '):
if i.isdigit():
my_list.append(int(i))
return my_list
def func1_1():
return [int(i) for line in open("foo.txt", "r") for i in line.strip().split(' ') if i.isdigit()]
def func1_3():
my_list = []
with open("foo.txt", "r") as f:
for line in f:
for i in line.split():
if i.isdigit():
my_list.append(int(i))
return my_list
def func2():
my_list = []
for line in open("foo.txt", "r"):
for i in line.split():
try:
my_list.append(int(i))
except ValueError:
pass
return my_list
def func3():
my_list = []
with open("foo.txt","r") as f:
cf = csv.reader(f, delimiter=' ')
for row in cf:
my_list.extend([int(i) for i in row if i.isdigit()])
return my_list
Results of timing tests -
In [25]: timeit func1()
The slowest run took 4.70 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 204 µs per loop
In [26]: timeit func1_1()
The slowest run took 4.39 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 207 µs per loop
In [27]: timeit func1_3()
The slowest run took 5.46 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 191 µs per loop
In [28]: timeit func2()
The slowest run took 4.09 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 212 µs per loop
In [34]: timeit func3()
The slowest run took 4.38 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 202 µs per loop
Given the methods that store the data into a list, I believe func1_3()
above is fastest (As shown by the timeit).
But given that , if you are really handling very large files , then you maybe better off using a generator rather than storing the complete list in memory.
UPDATE : As it was being said in the comments that func2()
is faster than func1_3()
(Though on my system it was never faster than func1_3()
even for only integers) , updated the foo.txt
to contain things other than numbers and taking timing tests -
foo.txt
1 2 10 11
asd dd
dds asda
22 44 32 11 23
dd dsa dds
21 12
12
33
45
dds
asdas
dasdasd dasd das d asda sda
Test -
In [13]: %timeit func1_3()
The slowest run took 6.17 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 210 µs per loop
In [14]: %timeit func2()
1000 loops, best of 3: 279 µs per loop
In [15]: %timeit func1_3()
1000 loops, best of 3: 213 µs per loop
In [16]: %timeit func2()
1000 loops, best of 3: 273 µs per loop
It's pretty easy if you can read the whole file as a string. (ie. it's not too large to do that)
fileStr = open('foo.txt').read().split()
integers = [int(x) for x in fileStr if x.isdigit()]
read()
turns it into a long string, and split
splits apart into a list of strings based on whitespace (ie. Spaces and newlines). So you can combine that with a list comprehension that converts them to integers if they're digits.
As Bakuriu noted, if the file is guaranteed to only have whitespace and numbers, then you don't have to check for isdigit(). Using list(map(int, open('foo.txt').read().split()))
would be enough in that case. That method will raise errors if anything is an invalid integer whereas the other will skip anything that isn't a recognised digit.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With