I'm writing a method to create an array from data file. The method looks like: <pre class="prettyprint"><code>import numpy def readDataFile(fileName): try: with open(fileName, 'r') as inputs: data = None for line in inputs: line = line.strip() items = line.split('\t') if data == None: data = numpy.array(items[0:len(items)]) else: data = numpy.vstack((data, items[0:len(items)])) return numpy.array(data) except IOError as ioerr: print 'IOError: ', ioerr return None </code></pre> My data file contains lines of numbers, each of which is separated from each other by a tab, e.g: <pre class="prettyprint"><code>1 2 3 4 5 6 7 8 9 </code></pre> And I expect to receive an array as follows: <pre class="prettyprint"><code>array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) </code></pre> However, the result contains <code>dtype</code> at the end of it: <pre class="prettyprint"><code>array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype='|S9') </code></pre> Because of it, I cannot perform some operations on the result, e.g. if I try to find the max value for each line using <code>result.max(0)</code>, I'll receive an error: <blockquote> TypeError: cannot perform reduce with flexible type. </blockquote> So, can anyone tell me what's wrong with my code and how to fix it? Thanks a lot.

Numpy array includes a method to do this job: <pre class="prettyprint"><code>import numpy as np a = np.array(['A', 'B']) a # Returns: array(['A', 'B'], dtype='|S1') a.tolist() # Returns ['A', 'B'] </code></pre> http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.tolist.html#numpy.ndarray.tolist

Remove dtype at the end of numpy array

Tags:

python

arrays

numpy

I'm writing a method to create an array from data file. The method looks like:

import numpy
def readDataFile(fileName):
    try:
        with open(fileName, 'r') as inputs:
            data = None
            for line in inputs:
                line = line.strip()
                items = line.split('\t')
                if data == None:
                    data = numpy.array(items[0:len(items)]) 
                else:
                    data = numpy.vstack((data, items[0:len(items)]))
                return numpy.array(data)
    except IOError as ioerr:
        print 'IOError: ', ioerr
        return None

My data file contains lines of numbers, each of which is separated from each other by a tab, e.g:

1 2 3
4 5 6
7 8 9

And I expect to receive an array as follows:

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

However, the result contains dtype at the end of it:

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]], dtype='|S9')

Because of it, I cannot perform some operations on the result, e.g. if I try to find the max value for each line using result.max(0), I'll receive an error:

TypeError: cannot perform reduce with flexible type.

So, can anyone tell me what's wrong with my code and how to fix it? Thanks a lot.

400

asked Apr 23 '12 21:04

Long Thai

2 Answers

The easiest fix is to use numpy's loadtxt:

data = numpy.loadtxt(fileName, dtype='float')

Just FYI, using numpy.vstack inside a loop is a bad idea. If you decide not to use loadtxt, you can replace your loop with the following to fix the dtype issue and eliminating the numpy.vstack.

data = [row.split('\t') for row in inputs]
data = np.array(data, dtype='float')

Update

Every time vstack is called it makes a new array, and copies the contents of the old arrays into the new one. This copy is roughly O(n) where n is the size of the array and if your loop runs n times the whole thing becomes O(n**2), in other words slow. If you know the final size of the array ahead of time, it's better to create the array outside the loop and fill the existing array. If you don't know the final size of the array, you can use a list inside the loop and call vstack at the end. For example:

import numpy as np
myArray = np.zeros((10,3))
for i in xrange(len(myArray)):
    myArray[i] = [i, i+1, i+2]

# or:
myArray = []
for i in xrange(10):
    myArray.append(np.array([i, i+1, i+2]))
myArray = np.vstack(myArray)

137

answered Sep 25 '22 14:09

Bi Rico

Numpy array includes a method to do this job:

import numpy as np
a = np.array(['A', 'B'])
a
# Returns: array(['A', 'B'],  dtype='|S1')

a.tolist()
# Returns ['A', 'B']

http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.tolist.html#numpy.ndarray.tolist

answered Sep 25 '22 14:09

Enrique Pérez Herrero

Related questions
                            
                                How to check a SSL certificate expiration date with aiohttp?
                            
                                psycopg2 and SQL injection security
                            
                                Python: How do I extract specific bits from a byte?
                            
                                Importing image to python :cannot import name 'imread'
                            
                                How to plot frequency count of pandas column?
                            
                                How can I launch an instance of an application using Python?
                            
                                Python MySQLdb update query fails
                            
                                Django URL.py and the index
                            
                                import an array in python
                            
                                Is it possible to use functions before declaring their body in python?
                            
                                Writing blob from SQLite to file using Python
                            
                                How to determine if your app is running on local Python Development Server?
                            
                                Repeatedly extract a line between two delimiters in a text file, Python
                            
                                Rounding in jinja2 brackets
                            
                                How can I implement multiple URL parameters in a Tornado route?
                            
                                Celery + Django: Cannot start celerybeat on Windows 7
                            
                                Populate numpy matrix from the difference of two vectors
                            
                                Python: import cx_Oracle ImportError: No module named cx_Oracle error is thown
                            
                                Removing trailing empty elements in a list
                            
                                HTTP POST and GET with cookies for authentication in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With