Numpy dtype - data type not understood

Q: What is Dtype U11 NumPy?

# dtype('<U11') In the first case, each element of the list we pass to the array constructor is an integer. Therefore, NumPy decides that the dtype should be integer (32 bit integer to be precise). In the second case, one of the elements (3.0) is a floating-point number.

Tags:

python

pandas

numpy

I have a dataframe that I am looking at the data types associated with each column.

When I run:

In [23]: df.dtype.descr

Out [24]: [(u'date', '<i8'), (u'open', '<f8'), (u'high', '<f8'), (u'low', '<f8'), (u'close', '<f8'), (u'volume', '<f8'), (u'dividend', '<f8'), (u'adj_factor', '<f8'), (u'split_factor', '<f8'), (u'liq', '<f8'), (u'currency', '|O')]

I want to set the currency dtype to S7. I am doing:

In [25]: dtype_new[-1] = (u'currency', "|S7")
In [26]: print dtype_new
Out [27]: [(u'date', '<i8'), (u'open', '<f8'), (u'high', '<f8'), (u'low', '<f8'), (u'close', '<f8'), (u'volume', '<f8'), (u'dividend', '<f8'), (u'adj_factor', '<f8'), (u'split_factor', '<f8'), (u'liq', '<f8'), (u'currency', '|S7')]

It looks to be the correct format. So I try to put it back to my df:

In [28]: df = df.astype(np.dtype(dtype_new))

And I get the error:

TypeError('data type not understood',)

What should I be changing? Thank you. This was working before I recently updated anaconda and I am not aware of the issue. Thanks.

ADJUSTMENT:

df.dtype is

In [23]: records.dtype
Out[23]: dtype((numpy.record, [(u'date', '<i8'), (u'open', '<f8'), (u'high',     '<f8'), (u'low', '<f8'), (u'close', '<f8'), (u'volume', '<f8'), (u'dividend', '<f8'), (u'adj_factor', '<f8'), (u'split_factor', '<f8'), (u'liq', '<f8'), (u'currency', 'O')]))

How can I change the '0' to a string less than 7 characters?

How can I change the last dtype from 'O' to something else? Specifically a string less than 7 characters.

LASTLY - is this a unicode issue? With Unicode:

In [38]: np.dtype([(u'date', '<i8')]) 
    ...: 
    ---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call     last)
<ipython-input-38-8702f0c7681f> in <module>()
----> 1 np.dtype([(u'date', '<i8')])

TypeError: data type not understood

No Unicode:

In [39]: np.dtype([('date', '<i8')])
Out[39]: dtype([('date', '<i8')])

415

asked Sep 20 '17 18:09

user1911092

1 Answers

It seems you have centered the point about unicode and, actually, you seem to have touched on a sore point.

Let's start from the last numpy documentation.

The documentation dtypes states that:

[(field_name, field_dtype, field_shape), ...]

obj should be a list of fields where each field is described by a tuple of length 2 or 3. (Equivalent to the descr item in the __array_interface__ attribute.)

The first element, field_name, is the field name (if this is '' then a standard field name, 'f#', is assigned). The field name may also be a 2-tuple of strings where the first string is either a “title” (which may be any string or unicode string) or meta-data for the field which can be any object, and the second string is the “name” which must be a valid Python identifier. The second element, field_dtype, can be anything that can be interpreted as a data-type. The optional third element field_shape contains the shape if this field represents an array of the data-type in the second element. Note that a 3-tuple with a third argument equal to 1 is equivalent to a 2-tuple. This style does not accept align in the dtype constructor as it is assumed that all of the memory is accounted for by the array interface description.

So the doc doesn't seem to really specify whether the field name can be unicode, what we can be sure from the doc is that if we define a tuple as the field name, e.g. ((u'date', 'date'), '<i8'), then using unicode as the "title" (notice, still not for the name!), leads to no errors.
Otherwise, also in this case, if you define ((u'date', u'date'), '<i8') you will get an error.

Now, you can use unicode names in Py2 by using the encode("ascii")

(u'date'.encode("ascii"))

and this should work.
One big point is that for Py2, Numpy does not allow to specify dtype with unicode field names as list of tuples, but allows it using dictionaries.

If I don't use unicode names in Py2, I can change the last field from |0 to |S7 or you have to use the encode("ascii") if you define the name as unicode string.

And the bugs involved...

To understand why it happens what you see, it is useful to have a look at the bugs/issues reported in Numpy and Pandas and the relative discussions.

Numpy
https://github.com/numpy/numpy/issues/2407
You can notice in the discussion (which I do not report here) mainly a couple of things:

the "issue" has been going on for a while
one trick people used was to use encode("ascii") on the unicode string
remember that the 'whatever' string has different defaults (bytes/unicode) in Py2/3
@hpaulj himself commented beautifully in that issue report that "If the dtype specification is of the list of tuples type, it checks whether each name is a string (as defined by py2 or 3) But if the dtype specification is a dictionary {'names':[ alist], 'formats':[alist]...}, the py2 case also allows unicode names"

Pandas
Also on the pandas side an issue has been reported which relates to the numpy issue: https://github.com/pandas-dev/pandas/pull/13462
It seems to have been fixed not that long ago.

115

answered Oct 05 '22 11:10

fedepad

Related questions
                            
                                How do you scale a design resolution to other resolutions with Pygame?
                            
                                How to get indexes of k maximum values from a numpy multidimensional array
                            
                                Python 3 hash HMAC-SHA512 [duplicate]
                            
                                How to build Python 3.4.6 from source?
                            
                                Any way to do integer division in sympy?
                            
                                Save user input after certain message telegram bot
                            
                                How to apply multiple functions to a groupby object
                            
                                string variable as latex in pyplot
                            
                                Call a function written in different file from jupyter notebook
                            
                                Keras/TF: Time Distributed CNN+LSTM for visual recognition
                            
                                Python 3.5 - Get counter to report zero-frequency items
                            
                                Swaping two elements in a list shows unexpected behaviour
                            
                                how to store worker-local variables in dask/distributed
                            
                                Why can I use a variable in a function before it is defined in Python?
                            
                                Python print floats padded with spaces instead of zeros
                            
                                Celery upgrade (3.1->4.1) - Connection reset by peer
                            
                                DJANGO_SETTINGS_MODULE not defined
                            
                                pandas-compat: 'import pandas' gives AttributeError: module 'pandas' has no attribute 'compat'
                            
                                Python pytest cases for async and await method
                            
                                why does my convolution routine differ from numpy & scipy's?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With