The file contains:
1 19 15 36 23 18 39
2 36 23 4 18 26 9
3 35 6 16 11
From that I'd like to extract list as follows:
L = [1,19,15,36,23,18,19,2,36........... ect.]
What is the most efficient way to do so?
You can read a text file using the open() and readlines() methods. To read a text file into a list, use the split() method. This method splits strings into a list at a certain character.
List literals are written within square brackets [ ]. Lists work similarly to strings -- use the len() function and square brackets [ ] to access data, with the first element at index 0. (See the official python.org list docs.)
The Python append method adds elements to the end of a list. We can also append to a file when we are writing to a file.
You can use itertools.chain, splitting each line and mapping to ints:
from itertools import chain
with open("in.txt") as f:
print(list((map(int,chain.from_iterable(line.split() for line in f)))))
[1, 19, 15, 36, 23, 18, 39, 2, 36, 23, 4, 18, 26, 9, 3, 35, 6, 16, 11]
For python2 use itertools.imap
instead of map. using chain with map and itertools.chain avoids reading all the file into memory at once which is what .read
will do.
Some timings for python3 on a file the same as your input * 1000:
In [5]: %%timeit
with open("ints.txt","r") as f:
list(map(int,re.split(r"\s+",f.read())))
...:
100 loops, best of 3: 8.55 ms per loop
In [6]: %%timeit
with open("ints.txt","r") as f:
list((map(int, chain.from_iterable(line.split() for line in f))))
...:
100 loops, best of 3: 5.76 ms per loop
In [7]: %%timeit
...: with open("ints.txt","r") as f:
...: [int(i) for i in f.read().split()]
...:
100 loops, best of 3: 5.82 ms per loop
So itertools matches the list comp but uses a lot less memory.
For python2:
In [3]: %%timeit
with open("ints.txt","r") as f:
[int(i) for i in f.read().split()]
...:
100 loops, best of 3: 7.79 ms per loop
In [4]: %%timeit
with open("ints.txt","r") as f:
list(imap(int, chain.from_iterable(line.split() for line in f)))
...:
100 loops, best of 3: 8.03 ms per loop
In [5]: %%timeit
with open("ints.txt","r") as f:
list(imap(int,re.split(r"\s+",f.read())))
...:
100 loops, best of 3: 10.6 ms per loop
The list comp is marginally faster but again uses more memory, if you were going to read all into memory with the read split approach imap is again the fastest:
In [6]: %%timeit
...: with open("ints.txt","r") as f:
...: list(imap(int, f.read().split()))
...:
100 loops, best of 3: 6.85 ms per loop
Same for python3 and map:
In [4]: %%timeit
with open("ints.txt","r") as f:
list(map(int,f.read().split()))
...:
100 loops, best of 3: 4.41 ms per loop
So if speed is all you care about use the list(map(int,f.read().split()))
or list(imap(int,f.read().split()))
approach.
If memory is also a concern combine it with chain. Another advantage to the chain approach if memory is a concern is if you are passing the ints to a function or iterating over you can pass the chain object directly so you don't need to keep all the data in memory at all.
One last small optimisation is to map str.split on the file object:
In [5]: %%timeit
with open("ints.txt", "r") as f:
list((map(int, chain.from_iterable(map(str.split, f)))))
...:
100 loops, best of 3: 5.32 ms per loop
with open('yourfile.txt') as f:
your_list = f.read().split()
To cast it to an integer. You can use a list compregension:
your_list = [int(i) for i in f.read().split()]
This might result in exception when the value can not be casted.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With