I have a text file, <code>data.txt</code>, which contains: <pre class="prettyprint"><code>5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa 5.8,2.7,4.1,1.0,Iris-versicolor 6.2,2.2,4.5,1.5,Iris-versicolor 6.4,3.1,5.5,1.8,Iris-virginica 6.0,3.0,4.8,1.8,Iris-virginica </code></pre> How do I load this data using <code>numpy.loadtxt()</code> so that I get a NumPy array after loading such as <code>[['5.1' '3.5' '1.4' '0.2' 'Iris-setosa'] ['4.9' '3.0' '1.4' '0.2' 'Iris-setosa'] ...]</code>? I tried <pre class="prettyprint"><code>np.loadtxt(open("data.txt"), 'r', dtype={ 'names': ( 'sepal length', 'sepal width', 'petal length', 'petal width', 'label'), 'formats': ( np.float, np.float, np.float, np.float, np.str)}, delimiter= ',', skiprows=0) </code></pre>

If you use np.genfromtxt, you could specify <code>dtype=None</code>, which will tell <code>genfromtxt</code> to intelligently guess the dtype of each column. Most conveniently, it relieves you of the burder of specifying the number of bytes required for the string column. (Omitting the number of bytes, by specifying e.g. <code>np.str</code>, does not work.) <pre class="prettyprint"><code>In [58]: np.genfromtxt('data.txt', delimiter=',', dtype=None, names=('sepal length', 'sepal width', 'petal length', 'petal width', 'label')) Out[58]: array([(5.1, 3.5, 1.4, 0.2, 'Iris-setosa'), (4.9, 3.0, 1.4, 0.2, 'Iris-setosa'), (5.8, 2.7, 4.1, 1.0, 'Iris-versicolor'), (6.2, 2.2, 4.5, 1.5, 'Iris-versicolor'), (6.4, 3.1, 5.5, 1.8, 'Iris-virginica'), (6.0, 3.0, 4.8, 1.8, 'Iris-virginica')], dtype=[('sepal_length', '<f8'), ('sepal_width', '<f8'), ('petal_length', '<f8'), ('petal_width', '<f8'), ('label', 'S15')]) </code></pre> <hr> If you do want to use <code>np.loadtxt</code>, then to fix your code with minimal changes, you could use: <pre class="prettyprint"><code>np.loadtxt("data.txt", dtype={'names': ('sepal length', 'sepal width', 'petal length', 'petal width', 'label'), 'formats': (np.float, np.float, np.float, np.float, '|S15')}, delimiter=',', skiprows=0) </code></pre> The main difference is simply changing <code>np.str</code> to <code>|S15</code> (a 15-byte string). Also note that <code>open("data.txt"), 'r'</code> should be <code>open("data.txt", 'r')</code>. But since <code>np.loadtxt</code> can accept a filename, you don't really need to use <code>open</code> at all.

It seems that keeping the numbers and text together has been causing you so much trouble - if you end up deciding to separate them, my workaround is: <pre class="prettyprint"><code>values = np.loadtxt('data', delimiter=',', usecols=[0,1,2,3]) labels = np.loadtxt('data', delimiter=',', usecols=[4]) </code></pre>

Loading text file containing both float and string using numpy.loadtxt

Tags:

python

python-3.x

numpy

python-2.7

I have a text file, data.txt, which contains:

5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa 5.8,2.7,4.1,1.0,Iris-versicolor 6.2,2.2,4.5,1.5,Iris-versicolor 6.4,3.1,5.5,1.8,Iris-virginica 6.0,3.0,4.8,1.8,Iris-virginica

How do I load this data using numpy.loadtxt() so that I get a NumPy array after loading such as [['5.1' '3.5' '1.4' '0.2' 'Iris-setosa'] ['4.9' '3.0' '1.4' '0.2' 'Iris-setosa'] ...]?

I tried

np.loadtxt(open("data.txt"), 'r',            dtype={                'names': (                    'sepal length', 'sepal width', 'petal length',                    'petal width', 'label'),                'formats': (                    np.float, np.float, np.float, np.float, np.str)},            delimiter= ',', skiprows=0)

834

asked May 08 '14 15:05

VeilEclipse

2 Answers

If you use np.genfromtxt, you could specify dtype=None, which will tell genfromtxt to intelligently guess the dtype of each column. Most conveniently, it relieves you of the burder of specifying the number of bytes required for the string column. (Omitting the number of bytes, by specifying e.g. np.str, does not work.)

In [58]: np.genfromtxt('data.txt', delimiter=',', dtype=None, names=('sepal length', 'sepal width', 'petal length', 'petal width', 'label')) Out[58]:  array([(5.1, 3.5, 1.4, 0.2, 'Iris-setosa'),        (4.9, 3.0, 1.4, 0.2, 'Iris-setosa'),        (5.8, 2.7, 4.1, 1.0, 'Iris-versicolor'),        (6.2, 2.2, 4.5, 1.5, 'Iris-versicolor'),        (6.4, 3.1, 5.5, 1.8, 'Iris-virginica'),        (6.0, 3.0, 4.8, 1.8, 'Iris-virginica')],        dtype=[('sepal_length', '<f8'), ('sepal_width', '<f8'), ('petal_length', '<f8'), ('petal_width', '<f8'), ('label', 'S15')])

If you do want to use np.loadtxt, then to fix your code with minimal changes, you could use:

np.loadtxt("data.txt",    dtype={'names': ('sepal length', 'sepal width', 'petal length', 'petal width', 'label'),           'formats': (np.float, np.float, np.float, np.float, '|S15')},    delimiter=',', skiprows=0)

The main difference is simply changing np.str to |S15 (a 15-byte string).

Also note that open("data.txt"), 'r' should be open("data.txt", 'r'). But since np.loadtxt can accept a filename, you don't really need to use open at all.

196

answered Oct 02 '22 15:10

unutbu

It seems that keeping the numbers and text together has been causing you so much trouble - if you end up deciding to separate them, my workaround is:

values = np.loadtxt('data', delimiter=',', usecols=[0,1,2,3]) labels = np.loadtxt('data', delimiter=',', usecols=[4])

answered Oct 02 '22 13:10

mauve

Related questions
                            
                                pandas cut with infinite upper/lower bounds
                            
                                How to remove shadow from scanned images using OpenCV?
                            
                                python flask display image on a html page [duplicate]
                            
                                Comparison between datetime and datetime64[ns] in pandas
                            
                                Counting non-zero elements within each row and within each column of a 2D NumPy array
                            
                                use django: from "python manage.py shell" to python script
                            
                                Python: How to convert datetime format? [duplicate]
                            
                                Python IndentationError: unexpected indent
                            
                                PermissionError: [Errno 13] Permission denied Flask.run()
                            
                                virtualenv error bad interpreter: No such file or directory
                            
                                AttributeError: 'NoneType' object has no attribute 'loader'
                            
                                why dict objects are unhashable in python?
                            
                                How to rearrange Pandas column sequence?
                            
                                Date Time Formats in Python
                            
                                How to get all combinations of length n in python
                            
                                Fetch an email with imaplib but do not mark it as SEEN
                            
                                How to split a sequence according to a predicate?
                            
                                Going to Python from R, what's the python equivalent of a data frame?
                            
                                Why is creating a class in Python so much slower than instantiating a class?
                            
                                Can i add help text in django model fields

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With