The file contains: <pre class="prettyprint"><code>1 19 15 36 23 18 39 2 36 23 4 18 26 9 3 35 6 16 11 </code></pre> From that I'd like to extract list as follows: <pre class="prettyprint"><code>L = [1,19,15,36,23,18,19,2,36........... ect.] </code></pre> What is the most efficient way to do so?

<pre class="prettyprint"><code>with open('yourfile.txt') as f: your_list = f.read().split() </code></pre> To cast it to an integer. You can use a list compregension: <pre class="prettyprint"><code>your_list = [int(i) for i in f.read().split()] </code></pre> This might result in exception when the value can not be casted.

Creating List From File In Python

Tags:

python

file

list

The file contains:

1 19 15 36 23 18 39 
2 36 23 4 18 26 9
3 35 6 16 11

From that I'd like to extract list as follows:

L = [1,19,15,36,23,18,19,2,36........... ect.]

What is the most efficient way to do so?

228

asked Aug 08 '15 12:08

Nikolay Gospodinov

2 Answers

You can use itertools.chain, splitting each line and mapping to ints:

from itertools import chain
with open("in.txt") as f:
    print(list((map(int,chain.from_iterable(line.split() for line in f)))))
[1, 19, 15, 36, 23, 18, 39, 2, 36, 23, 4, 18, 26, 9, 3, 35, 6, 16, 11]

For python2 use itertools.imap instead of map. using chain with map and itertools.chain avoids reading all the file into memory at once which is what .read will do.

Some timings for python3 on a file the same as your input * 1000:

In [5]: %%timeit
with open("ints.txt","r") as f:
    list(map(int,re.split(r"\s+",f.read())))
   ...: 
100 loops, best of 3: 8.55 ms per loop

In [6]: %%timeit                                                
with open("ints.txt","r") as f:
    list((map(int, chain.from_iterable(line.split() for line in f))))
   ...: 
100 loops, best of 3: 5.76 ms per loop

In [7]: %%timeit
...: with open("ints.txt","r") as f:
...:      [int(i) for i in f.read().split()]
...: 
100 loops, best of 3: 5.82 ms per loop

So itertools matches the list comp but uses a lot less memory.

For python2:

In [3]: %%timeit                                                
with open("ints.txt","r") as f:
     [int(i) for i in f.read().split()]
   ...: 
100 loops, best of 3: 7.79 ms per loop

In [4]: %%timeit                                                
with open("ints.txt","r") as f:
    list(imap(int, chain.from_iterable(line.split() for line in f)))
   ...: 
100 loops, best of 3: 8.03 ms per loop

In [5]: %%timeit                                                
with open("ints.txt","r") as f:
    list(imap(int,re.split(r"\s+",f.read())))
   ...: 
100 loops, best of 3: 10.6 ms per loop

The list comp is marginally faster but again uses more memory, if you were going to read all into memory with the read split approach imap is again the fastest:

In [6]: %%timeit
   ...: with open("ints.txt","r") as f:
   ...:      list(imap(int, f.read().split()))
   ...: 
100 loops, best of 3: 6.85 ms per loop

Same for python3 and map:

In [4]: %%timeit                                                
with open("ints.txt","r") as f:
     list(map(int,f.read().split()))
   ...: 
100 loops, best of 3: 4.41 ms per loop

So if speed is all you care about use the list(map(int,f.read().split())) or list(imap(int,f.read().split())) approach.
If memory is also a concern combine it with chain. Another advantage to the chain approach if memory is a concern is if you are passing the ints to a function or iterating over you can pass the chain object directly so you don't need to keep all the data in memory at all.

One last small optimisation is to map str.split on the file object:

In [5]: %%timeit
with open("ints.txt", "r") as f:
    list((map(int, chain.from_iterable(map(str.split, f)))))
   ...: 
100 loops, best of 3: 5.32 ms per loop

answered Sep 17 '22 02:09

Padraic Cunningham

with open('yourfile.txt') as f:
    your_list = f.read().split()

To cast it to an integer. You can use a list compregension:

your_list = [int(i) for i in f.read().split()]

This might result in exception when the value can not be casted.

answered Sep 18 '22 02:09

Klaus D.

Related questions
                            
                                How to find the relative path between two directories?
                            
                                ImportError: this is MySQLdb version (1, 2, 4, 'beta', 4), but _mysql is version (1, 2, 5, 'final', 1)
                            
                                Use phone number to authenticate user in Django
                            
                                Merge multiple pandas columns into new column
                            
                                How can I determine the length of a multi-page TIFF using Python Image Library (PIL)?
                            
                                Nested maps in Python 3
                            
                                Trivial functors
                            
                                ctypes error AttributeError symbol not found, OS X 10.7.5
                            
                                Crontab using incorrect version of Python to run script
                            
                                error when accessing pandas dataframe index
                            
                                How Do I put 2 matrix into scipy.optimize.minimize?
                            
                                Mask Type Error in OpenCV Mean
                            
                                What is the difference between iter(x) and x.__iter__()?
                            
                                Login to a website via Python - how to deal with CSRF?
                            
                                How do i add the contents of a pillar variable to a file with salt?
                            
                                Import files in python with a for loop and a list of names
                            
                                Django send_mail through gmail very slow
                            
                                Flask Admin Custom Form Validation for Multiple Fields?
                            
                                Can't import pprint
                            
                                Convert XML to CSV file

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With