Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating List From File In Python

Tags:

python

file

list

The file contains:

1 19 15 36 23 18 39 
2 36 23 4 18 26 9
3 35 6 16 11

From that I'd like to extract list as follows:

L = [1,19,15,36,23,18,19,2,36........... ect.]

What is the most efficient way to do so?

like image 228
Nikolay Gospodinov Avatar asked Aug 08 '15 12:08

Nikolay Gospodinov


People also ask

How do I turn a file into a list in Python?

You can read a text file using the open() and readlines() methods. To read a text file into a list, use the split() method. This method splits strings into a list at a certain character.

How do you write a list in Python?

List literals are written within square brackets [ ]. Lists work similarly to strings -- use the len() function and square brackets [ ] to access data, with the first element at index 0. (See the official python.org list docs.)

Can you append a list to a file in Python?

The Python append method adds elements to the end of a list. We can also append to a file when we are writing to a file.


2 Answers

You can use itertools.chain, splitting each line and mapping to ints:

from itertools import chain
with open("in.txt") as f:
    print(list((map(int,chain.from_iterable(line.split() for line in f)))))
[1, 19, 15, 36, 23, 18, 39, 2, 36, 23, 4, 18, 26, 9, 3, 35, 6, 16, 11]

For python2 use itertools.imap instead of map. using chain with map and itertools.chain avoids reading all the file into memory at once which is what .read will do.

Some timings for python3 on a file the same as your input * 1000:

In [5]: %%timeit
with open("ints.txt","r") as f:
    list(map(int,re.split(r"\s+",f.read())))
   ...: 
100 loops, best of 3: 8.55 ms per loop

In [6]: %%timeit                                                
with open("ints.txt","r") as f:
    list((map(int, chain.from_iterable(line.split() for line in f))))
   ...: 
100 loops, best of 3: 5.76 ms per loop

In [7]: %%timeit
...: with open("ints.txt","r") as f:
...:      [int(i) for i in f.read().split()]
...: 
100 loops, best of 3: 5.82 ms per loop

So itertools matches the list comp but uses a lot less memory.

For python2:

In [3]: %%timeit                                                
with open("ints.txt","r") as f:
     [int(i) for i in f.read().split()]
   ...: 
100 loops, best of 3: 7.79 ms per loop

In [4]: %%timeit                                                
with open("ints.txt","r") as f:
    list(imap(int, chain.from_iterable(line.split() for line in f)))
   ...: 
100 loops, best of 3: 8.03 ms per loop

In [5]: %%timeit                                                
with open("ints.txt","r") as f:
    list(imap(int,re.split(r"\s+",f.read())))
   ...: 
100 loops, best of 3: 10.6 ms per loop

The list comp is marginally faster but again uses more memory, if you were going to read all into memory with the read split approach imap is again the fastest:

In [6]: %%timeit
   ...: with open("ints.txt","r") as f:
   ...:      list(imap(int, f.read().split()))
   ...: 
100 loops, best of 3: 6.85 ms per loop

Same for python3 and map:

In [4]: %%timeit                                                
with open("ints.txt","r") as f:
     list(map(int,f.read().split()))
   ...: 
100 loops, best of 3: 4.41 ms per loop

So if speed is all you care about use the list(map(int,f.read().split())) or list(imap(int,f.read().split())) approach.
If memory is also a concern combine it with chain. Another advantage to the chain approach if memory is a concern is if you are passing the ints to a function or iterating over you can pass the chain object directly so you don't need to keep all the data in memory at all.

One last small optimisation is to map str.split on the file object:

In [5]: %%timeit
with open("ints.txt", "r") as f:
    list((map(int, chain.from_iterable(map(str.split, f)))))
   ...: 
100 loops, best of 3: 5.32 ms per loop
like image 93
Padraic Cunningham Avatar answered Sep 17 '22 02:09

Padraic Cunningham


with open('yourfile.txt') as f:
    your_list = f.read().split()

To cast it to an integer. You can use a list compregension:

your_list = [int(i) for i in f.read().split()]

This might result in exception when the value can not be casted.

like image 32
Klaus D. Avatar answered Sep 18 '22 02:09

Klaus D.