Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy dtype for list with mixed data types

Tags:

python

numpy

I have a list, my_list, with mixed data types that I want to convert into a numpy array. However, I get the error TypeError: expected a readable buffer object. See code below. I've tried to base my code on the NumPy documentation.

my_list = [['User_0', '2012-2', 1, 6, 0, 1.0], ['User_0', '2012-2', 5, 6, 0, 1.0], ['User_0', '2012-3', 0, 0, 4, 1.0]]
my_np_array = np.array(my_list, dtype='S30, S8, i4, i4, f32')   
like image 894
pir Avatar asked Jun 09 '15 18:06

pir


People also ask

Can NumPy have mixed data types?

Having a data type (dtype) is one of the key features that distinguishes NumPy arrays from lists. In lists, the types of elements can be mixed.

Can NumPy array contains different data types?

While a Python list can contain different data types within a single list, all of the elements in a NumPy array should be homogeneous.

Can NumPy contain elements of different types?

Yes, if you use numpy structured arrays, each element of the array would be a "structure", and the fields of the structure can have different datatypes.

What is NumPy Dtype (' O ')?

It means: 'O' (Python) objects. Source. The first character specifies the kind of data and the remaining characters specify the number of bytes per item, except for Unicode, where it is interpreted as the number of characters. The item size must correspond to an existing type, or an error will be raised.


2 Answers

Why don't use dtype=object?

In [1]: my_list = [['User_0', '2012-2', 1, 6, 0, 1.0], ['User_0', '2012-2', 5,
6, 0, 1.0], ['User_0', '2012-3', 0, 0, 4, 1.0]]
In [2]: my_np_array = np.array(my_list, dtype=object)
In [3]: my_np_array
Out[3]:
array([['User_0', '2012-2', 1, 6, 0, 1.0],
       ['User_0', '2012-2', 5, 6, 0, 1.0],
       ['User_0', '2012-3', 0, 0, 4, 1.0]], dtype=object)

Note It's about memory usage, when you specify the dtype of each column, memory allocated to your ndarray will be less than when you use dtype=object which contain all possible type in python so the memory allocated for each column will be maximal.

like image 77
farhawa Avatar answered Nov 04 '22 14:11

farhawa


Your nested items should be tuple also you omitted one i4 in your types :

>>> my_np_array = np.array(map(tuple,my_list), dtype='|S30, |S8, i4, i4, i4, f32')  
>>> my_np_array
array([('User_0', '2012-2', 1, 6, 0, 1.0),
       ('User_0', '2012-2', 5, 6, 0, 1.0),
       ('User_0', '2012-3', 0, 0, 4, 1.0)], 
      dtype=[('f0', 'S30'), ('f1', 'S8'), ('f2', '<i4'), ('f3', '<i4'), ('f4', '<i4'), ('f5', '<f4')])

As far as is know since numpy use tuples to preserve its types when you used multiple type for array items you need to convert your sub arrays to tuple like dtype elements.

like image 35
Mazdak Avatar answered Nov 04 '22 15:11

Mazdak