Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Loading text file containing both float and string using numpy.loadtxt

I have a text file, data.txt, which contains:

5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa 5.8,2.7,4.1,1.0,Iris-versicolor 6.2,2.2,4.5,1.5,Iris-versicolor 6.4,3.1,5.5,1.8,Iris-virginica 6.0,3.0,4.8,1.8,Iris-virginica 

How do I load this data using numpy.loadtxt() so that I get a NumPy array after loading such as [['5.1' '3.5' '1.4' '0.2' 'Iris-setosa'] ['4.9' '3.0' '1.4' '0.2' 'Iris-setosa'] ...]?

I tried

np.loadtxt(open("data.txt"), 'r',            dtype={                'names': (                    'sepal length', 'sepal width', 'petal length',                    'petal width', 'label'),                'formats': (                    np.float, np.float, np.float, np.float, np.str)},            delimiter= ',', skiprows=0) 
like image 834
VeilEclipse Avatar asked May 08 '14 15:05

VeilEclipse


People also ask

What does the function Loadtxt () do in Numpy?

loadtxt() function. The loadtxt() function is used to load data from a text file. Each row in the text file must have the same number of values.

How do you load a text file in Python?

To read a text file in Python, you follow these steps: First, open a text file for reading by using the open() function. Second, read text from the text file using the file read() , readline() , or readlines() method of the file object. Third, close the file using the file close() method.

What parameter within Numpy Loadtxt can be used to skip a row?

loadtxt) . You'll find the skiprows parameter will allow you to skip the first N rows: In [1]: import numpy as np In [2]: help(np. loadtxt) Help on function loadtxt in module numpy.


2 Answers

If you use np.genfromtxt, you could specify dtype=None, which will tell genfromtxt to intelligently guess the dtype of each column. Most conveniently, it relieves you of the burder of specifying the number of bytes required for the string column. (Omitting the number of bytes, by specifying e.g. np.str, does not work.)

In [58]: np.genfromtxt('data.txt', delimiter=',', dtype=None, names=('sepal length', 'sepal width', 'petal length', 'petal width', 'label')) Out[58]:  array([(5.1, 3.5, 1.4, 0.2, 'Iris-setosa'),        (4.9, 3.0, 1.4, 0.2, 'Iris-setosa'),        (5.8, 2.7, 4.1, 1.0, 'Iris-versicolor'),        (6.2, 2.2, 4.5, 1.5, 'Iris-versicolor'),        (6.4, 3.1, 5.5, 1.8, 'Iris-virginica'),        (6.0, 3.0, 4.8, 1.8, 'Iris-virginica')],        dtype=[('sepal_length', '<f8'), ('sepal_width', '<f8'), ('petal_length', '<f8'), ('petal_width', '<f8'), ('label', 'S15')]) 

If you do want to use np.loadtxt, then to fix your code with minimal changes, you could use:

np.loadtxt("data.txt",    dtype={'names': ('sepal length', 'sepal width', 'petal length', 'petal width', 'label'),           'formats': (np.float, np.float, np.float, np.float, '|S15')},    delimiter=',', skiprows=0) 

The main difference is simply changing np.str to |S15 (a 15-byte string).

Also note that open("data.txt"), 'r' should be open("data.txt", 'r'). But since np.loadtxt can accept a filename, you don't really need to use open at all.

like image 196
unutbu Avatar answered Oct 02 '22 15:10

unutbu


It seems that keeping the numbers and text together has been causing you so much trouble - if you end up deciding to separate them, my workaround is:

values = np.loadtxt('data', delimiter=',', usecols=[0,1,2,3]) labels = np.loadtxt('data', delimiter=',', usecols=[4]) 
like image 21
mauve Avatar answered Oct 02 '22 13:10

mauve