I'm trying to read in a file that looks like this: <pre class="prettyprint"><code>1, 2, 3, 4, </code></pre> I'm using the following line: <pre class="prettyprint"><code>l1,l2 = numpy.loadtxt('file.txt',unpack=True,delimiter=', ') </code></pre> This gives me an error because the end comma in each row is lumped together as the last element (e.g. "2" is read as "2,"). Is there a way to ignore the last comma in each row, with loadtxt or another function?

<code>numpy.genfromtxt</code> is a bit more robust. If you use the default dtype (which is <code>np.float64</code>), it thinks there is a third column with missing values, so it creates a third column containing <code>nan</code>. If you give it <code>dtype=None</code> (which tells it to figure out the data type from the file), it returns a third column containing all zeros. Either way, you can ignore the last column by using <code>usecols=[0, 1]</code>: <pre class="prettyprint"><code>In [14]: !cat trailing_comma.csv 1, 2, 3, 4, </code></pre> Important note: I use <code>delimiter=','</code>, not <code>delimiter=', '</code>. <pre class="prettyprint"><code>In [15]: np.genfromtxt('trailing_comma.csv', delimiter=',', dtype=None, usecols=[0,1]) Out[15]: array([[1, 2], [3, 4]]) In [16]: col1, col2 = np.genfromtxt('trailing_comma.csv', delimiter=',', dtype=None, usecols=[0,1], unpack=True) In [17]: col1 Out[17]: array([1, 3]) In [18]: col2 Out[18]: array([2, 4]) </code></pre>

<code>usecols</code> also works with <code>loadtxt</code>: Simulate a file with text split into lines: <pre class="prettyprint"><code>In [162]: txt=b"""1, 2, 3,4,""" In [163]: txt=txt.splitlines() In [164]: txt Out[164]: [b'1, 2,', b'3,4,'] In [165]: x,y=np.loadtxt(txt,delimiter=',',usecols=[0,1],unpack=True) In [166]: x Out[166]: array([ 1., 3.]) In [167]: y Out[167]: array([ 2., 4.]) </code></pre> <code>loadtxt</code> and <code>genfromtxt</code> don't work well with multicharacter delimiters. <code>loadtxt</code> and <code>genfromtxt</code> accept any iterable, including a generator. Thus you could open the file and process the lines one by one, removing the extra character. <pre class="prettyprint"><code>In [180]: def g(txt): .....: t = txt.splitlines() .....: for l in t: .....: yield l[:-1] In [181]: list(g(txt)) Out[181]: [b'1, 2', b'3,4'] </code></pre> A generator that yields the lines one by one, stripped of the last character. This could be changed to read a file line by line: <pre class="prettyprint"><code>In [182]: x,y=np.loadtxt(g(txt),delimiter=',',unpack=True) In [183]: x,y Out[183]: (array([ 1., 3.]), array([ 2., 4.])) </code></pre>

Python - numpy.loadtxt how to ignore end commas?

Tags:

python

numpy

I'm trying to read in a file that looks like this:

1, 2,
3, 4,

I'm using the following line:

l1,l2 = numpy.loadtxt('file.txt',unpack=True,delimiter=', ')

This gives me an error because the end comma in each row is lumped together as the last element (e.g. "2" is read as "2,"). Is there a way to ignore the last comma in each row, with loadtxt or another function?

318

asked Nov 08 '15 19:11

ylangylang

2 Answers

numpy.genfromtxt is a bit more robust. If you use the default dtype (which is np.float64), it thinks there is a third column with missing values, so it creates a third column containing nan. If you give it dtype=None (which tells it to figure out the data type from the file), it returns a third column containing all zeros. Either way, you can ignore the last column by using usecols=[0, 1]:

In [14]: !cat trailing_comma.csv
1, 2,
3, 4,

Important note: I use delimiter=',', not delimiter=', '.

In [15]: np.genfromtxt('trailing_comma.csv', delimiter=',', dtype=None, usecols=[0,1])
Out[15]: 
array([[1, 2],
       [3, 4]])

In [16]: col1, col2 = np.genfromtxt('trailing_comma.csv', delimiter=',', dtype=None, usecols=[0,1], unpack=True)

In [17]: col1
Out[17]: array([1, 3])

In [18]: col2
Out[18]: array([2, 4])

answered Oct 05 '22 02:10

Warren Weckesser

usecols also works with loadtxt:

Simulate a file with text split into lines:

In [162]: txt=b"""1, 2,
3,4,"""
In [163]: txt=txt.splitlines()
In [164]: txt
Out[164]: [b'1, 2,', b'3,4,']

In [165]: x,y=np.loadtxt(txt,delimiter=',',usecols=[0,1],unpack=True)
In [166]: x
Out[166]: array([ 1.,  3.])
In [167]: y
Out[167]: array([ 2.,  4.])

loadtxt and genfromtxt don't work well with multicharacter delimiters.

loadtxt and genfromtxt accept any iterable, including a generator. Thus you could open the file and process the lines one by one, removing the extra character.

In [180]: def g(txt):
   .....:     t = txt.splitlines()
   .....:     for l in t:
   .....:         yield l[:-1]

In [181]: list(g(txt))
Out[181]: [b'1, 2', b'3,4']

A generator that yields the lines one by one, stripped of the last character. This could be changed to read a file line by line:

In [182]: x,y=np.loadtxt(g(txt),delimiter=',',unpack=True)
In [183]: x,y
Out[183]: (array([ 1.,  3.]), array([ 2.,  4.]))

answered Oct 05 '22 01:10

hpaulj

Related questions
                            
                                Matplotlib - How do I set ylim() for a series of plots?
                            
                                Remove the duplicate values and sum the corresponding column values
                            
                                Kivy Layout height to adapt to child widgets's height
                            
                                Escaping "\n" new line in list comprehension vs for loop in Python
                            
                                Sqlalchemy - update column based on changes in another column
                            
                                Python: displaying a line of text outside a matplotlib chart
                            
                                Divide .csv file into chunks with Python
                            
                                What is the Laplacian mask/kernel used in the scipy.ndimage.filter.laplace()?
                            
                                Matplotlib Basemap Coastal Coordinates
                            
                                Why are you never supposed to reload modules? [duplicate]
                            
                                pygraphviz, ImportError: undefined symbol: Agundirected
                            
                                Python regex: Remove a pattern at the end of string
                            
                                How do you get the url from Submission object in PRAW?
                            
                                How to create groupby subplots in Pandas?
                            
                                Generate a list of 6 random numbers between 1 and 6 in python
                            
                                How to tell if a string has exactly 8 1's and 0's in it in python
                            
                                matplotlib conditional background color in python
                            
                                Concatenating numpy vector and matrix horizontally
                            
                                Pandas - How to replace string with zero values in a DataFrame series?
                            
                                Choosing a maximum randomly in the case of a tie?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With