I'm trying to read in a file that looks like this:
1, 2,
3, 4,
I'm using the following line:
l1,l2 = numpy.loadtxt('file.txt',unpack=True,delimiter=', ')
This gives me an error because the end comma in each row is lumped together as the last element (e.g. "2" is read as "2,"). Is there a way to ignore the last comma in each row, with loadtxt or another function?
Load data from a text file. Each row in the text file must have the same number of values. File, filename, list, or generator to read.
dtype : Data-type of the resulting array; default: float. If this is a structured data-type, the resulting array will be 1-dimensional, and each row will be interpreted as an element of the array.
fifth parameter - unpack. When unpack is True, the returned array is transposed, so that arguments may be unpacked using x, y, z = loadtxt(...) .
numpy.genfromtxt
is a bit more robust. If you use the default dtype (which is np.float64
), it thinks there is a third column with missing values, so it creates a third column containing nan
. If you give it dtype=None
(which tells it to figure out the data type from the file), it returns a third column containing all zeros. Either way, you can ignore the last column by using usecols=[0, 1]
:
In [14]: !cat trailing_comma.csv
1, 2,
3, 4,
Important note: I use delimiter=','
, not delimiter=', '
.
In [15]: np.genfromtxt('trailing_comma.csv', delimiter=',', dtype=None, usecols=[0,1])
Out[15]:
array([[1, 2],
[3, 4]])
In [16]: col1, col2 = np.genfromtxt('trailing_comma.csv', delimiter=',', dtype=None, usecols=[0,1], unpack=True)
In [17]: col1
Out[17]: array([1, 3])
In [18]: col2
Out[18]: array([2, 4])
usecols
also works with loadtxt
:
Simulate a file with text split into lines:
In [162]: txt=b"""1, 2,
3,4,"""
In [163]: txt=txt.splitlines()
In [164]: txt
Out[164]: [b'1, 2,', b'3,4,']
In [165]: x,y=np.loadtxt(txt,delimiter=',',usecols=[0,1],unpack=True)
In [166]: x
Out[166]: array([ 1., 3.])
In [167]: y
Out[167]: array([ 2., 4.])
loadtxt
and genfromtxt
don't work well with multicharacter delimiters.
loadtxt
and genfromtxt
accept any iterable, including a generator. Thus you could open the file and process the lines one by one, removing the extra character.
In [180]: def g(txt):
.....: t = txt.splitlines()
.....: for l in t:
.....: yield l[:-1]
In [181]: list(g(txt))
Out[181]: [b'1, 2', b'3,4']
A generator that yields the lines one by one, stripped of the last character. This could be changed to read a file line by line:
In [182]: x,y=np.loadtxt(g(txt),delimiter=',',unpack=True)
In [183]: x,y
Out[183]: (array([ 1., 3.]), array([ 2., 4.]))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With