Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove dtype at the end of numpy array

I'm writing a method to create an array from data file. The method looks like:

import numpy
def readDataFile(fileName):
    try:
        with open(fileName, 'r') as inputs:
            data = None
            for line in inputs:
                line = line.strip()
                items = line.split('\t')
                if data == None:
                    data = numpy.array(items[0:len(items)]) 
                else:
                    data = numpy.vstack((data, items[0:len(items)]))
                return numpy.array(data)
    except IOError as ioerr:
        print 'IOError: ', ioerr
        return None

My data file contains lines of numbers, each of which is separated from each other by a tab, e.g:

1 2 3
4 5 6
7 8 9

And I expect to receive an array as follows:

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

However, the result contains dtype at the end of it:

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]], dtype='|S9')

Because of it, I cannot perform some operations on the result, e.g. if I try to find the max value for each line using result.max(0), I'll receive an error:

TypeError: cannot perform reduce with flexible type.

So, can anyone tell me what's wrong with my code and how to fix it? Thanks a lot.

like image 400
Long Thai Avatar asked Apr 23 '12 21:04

Long Thai


People also ask

How do you remove the last element of a NP array?

Now to remove the last element from the array, create a index array containing indexes of all the elements except for the last element. Then pass this index array as index to the original NumPy Array, This will give an array with last element removed.

How do I remove part of a NumPy array?

To remove an element from a NumPy array: Specify the index of the element to remove. Call the numpy. delete() function on the array for the given index.

How do you remove the last value in an array in Python?

The simplest approach is to use the list's pop([i]) function, which removes an element present at the specified position in the list. If we don't specify any index, pop() removes and returns the last element in the list.

How do I change the Dtype of a NumPy array?

We have a method called astype(data_type) to change the data type of a numpy array. If we have a numpy array of type float64, then we can change it to int32 by giving the data type to the astype() method of numpy array. We can check the type of numpy array using the dtype class.


2 Answers

The easiest fix is to use numpy's loadtxt:

data = numpy.loadtxt(fileName, dtype='float')

Just FYI, using numpy.vstack inside a loop is a bad idea. If you decide not to use loadtxt, you can replace your loop with the following to fix the dtype issue and eliminating the numpy.vstack.

data = [row.split('\t') for row in inputs]
data = np.array(data, dtype='float')

Update

Every time vstack is called it makes a new array, and copies the contents of the old arrays into the new one. This copy is roughly O(n) where n is the size of the array and if your loop runs n times the whole thing becomes O(n**2), in other words slow. If you know the final size of the array ahead of time, it's better to create the array outside the loop and fill the existing array. If you don't know the final size of the array, you can use a list inside the loop and call vstack at the end. For example:

import numpy as np
myArray = np.zeros((10,3))
for i in xrange(len(myArray)):
    myArray[i] = [i, i+1, i+2]

# or:
myArray = []
for i in xrange(10):
    myArray.append(np.array([i, i+1, i+2]))
myArray = np.vstack(myArray)
like image 137
Bi Rico Avatar answered Sep 25 '22 14:09

Bi Rico


Numpy array includes a method to do this job:

import numpy as np
a = np.array(['A', 'B'])
a
# Returns: array(['A', 'B'],  dtype='|S1')

a.tolist()
# Returns ['A', 'B']

http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.tolist.html#numpy.ndarray.tolist

like image 38
Enrique Pérez Herrero Avatar answered Sep 25 '22 14:09

Enrique Pérez Herrero