Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - numpy.loadtxt how to ignore end commas?

Tags:

python

numpy

I'm trying to read in a file that looks like this:

1, 2,
3, 4,

I'm using the following line:

l1,l2 = numpy.loadtxt('file.txt',unpack=True,delimiter=', ')

This gives me an error because the end comma in each row is lumped together as the last element (e.g. "2" is read as "2,"). Is there a way to ignore the last comma in each row, with loadtxt or another function?

like image 318
ylangylang Avatar asked Nov 08 '15 19:11

ylangylang


People also ask

What does Loadtxt () do in Numpy?

Load data from a text file. Each row in the text file must have the same number of values. File, filename, list, or generator to read.

What is the default datatype that NP Loadtxt () uses for numbers?

dtype : Data-type of the resulting array; default: float. If this is a structured data-type, the resulting array will be 1-dimensional, and each row will be interpreted as an element of the array.

What is unpack true in Python?

fifth parameter - unpack. When unpack is True, the returned array is transposed, so that arguments may be unpacked using x, y, z = loadtxt(...) .


2 Answers

numpy.genfromtxt is a bit more robust. If you use the default dtype (which is np.float64), it thinks there is a third column with missing values, so it creates a third column containing nan. If you give it dtype=None (which tells it to figure out the data type from the file), it returns a third column containing all zeros. Either way, you can ignore the last column by using usecols=[0, 1]:

In [14]: !cat trailing_comma.csv
1, 2,
3, 4,

Important note: I use delimiter=',', not delimiter=', '.

In [15]: np.genfromtxt('trailing_comma.csv', delimiter=',', dtype=None, usecols=[0,1])
Out[15]: 
array([[1, 2],
       [3, 4]])

In [16]: col1, col2 = np.genfromtxt('trailing_comma.csv', delimiter=',', dtype=None, usecols=[0,1], unpack=True)

In [17]: col1
Out[17]: array([1, 3])

In [18]: col2
Out[18]: array([2, 4])
like image 95
Warren Weckesser Avatar answered Oct 05 '22 02:10

Warren Weckesser


usecols also works with loadtxt:

Simulate a file with text split into lines:

In [162]: txt=b"""1, 2,
3,4,"""
In [163]: txt=txt.splitlines()
In [164]: txt
Out[164]: [b'1, 2,', b'3,4,']

In [165]: x,y=np.loadtxt(txt,delimiter=',',usecols=[0,1],unpack=True)
In [166]: x
Out[166]: array([ 1.,  3.])
In [167]: y
Out[167]: array([ 2.,  4.])

loadtxt and genfromtxt don't work well with multicharacter delimiters.

loadtxt and genfromtxt accept any iterable, including a generator. Thus you could open the file and process the lines one by one, removing the extra character.

In [180]: def g(txt):
   .....:     t = txt.splitlines()
   .....:     for l in t:
   .....:         yield l[:-1]

In [181]: list(g(txt))
Out[181]: [b'1, 2', b'3,4']

A generator that yields the lines one by one, stripped of the last character. This could be changed to read a file line by line:

In [182]: x,y=np.loadtxt(g(txt),delimiter=',',unpack=True)
In [183]: x,y
Out[183]: (array([ 1.,  3.]), array([ 2.,  4.]))
like image 29
hpaulj Avatar answered Oct 05 '22 01:10

hpaulj