I want to read data from a file that has many missing values, as in this example:
1,2,3,4,5
6,,,7,8
,,9,10,11
I am using the numpy.loadtxt function:
data = numpy.loadtxt('test.data', delimiter=',')
The problem is that the missing values break loadtxt (I get a "ValueError: could not convert string to float:", no doubt because of the two or more consecutive delimiters).
Is there a way to do this automatically, with loadtxt or another function, or do I have to bite the bullet and parse each line manually?
I'd probably use genfromtxt:
>>> from numpy import genfromtxt
>>> genfromtxt("missing1.dat", delimiter=",")
array([[ 1., 2., 3., 4., 5.],
[ 6., nan, nan, 7., 8.],
[ nan, nan, 9., 10., 11.]])
and then do whatever with the nans (change them to something, use a mask instead, etc.) Some of this could be done inline:
>>> genfromtxt("missing1.dat", delimiter=",", filling_values=99)
array([[ 1., 2., 3., 4., 5.],
[ 6., 99., 99., 7., 8.],
[ 99., 99., 9., 10., 11.]])
This is because the function expects to return a numpy array with all cells of the same type.
If you want a table with mixed strings and number, you should read it into a structured array instead, also you probably want to add skip_header=1
to skip the first line, ie in your case something like:
np.genfromtxt('upeak_names.txt', delimiter="\t", dtype="S10,S10,f4,S10,f4,S10,f4",
names=["id", "name", "Distance", "name2", "Distance2", "name3", "Distance3], skip_header=1)
See also:
Documentation for genfromtxt: https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.genfromtxt.html
Documentation for structured arrays in numpy:
https://docs.scipy.org/doc/numpy-1.15.0/user/basics.rec.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With