Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

`TypeError: invalid type promotion` when appending to a heterogeneous numpy array

I have created an array with:

Ticket_data = np.empty((0,7),
                       dtype='str,datetime64[m],datetime64[m],str,str,str,str')

and I am trying to append data with:

lineitem = [str(data[0][0]), OpenDT, CloseDT, str(data[0][11]),
            str(data[0][12]), str(data[0][13]), str(data[0][14])]

Where OpenDT and CloseDT were created with np.datetime64(DTstring, 'm')

I am getting the error:

Traceback (most recent call last):
  File "Daily Report.py", line 25, in <module>
    np.append(Ticket_data, np.array([lineitem]), axis=0)
  File "C:\Python27\lib\site-packages\numpy\lib\function_base.py", line 3884, in append
    return concatenate((arr, values), axis=axis)
TypeError: invalid type promotion

Edit:

print np.array([lineitem])

outputs

[['21539' '2015-06-30T10:46-0700' '2015-06-30T10:55-0700' 'Testtext'
 'Testtext2' 'Testtext3' 'Testtext5']]

and

print np.array([lineitem], dtype=Ticket_data.dtype)

Outputs

[[('', 245672259890L, datetime.datetime(1970, 1, 1, 0, 0), '', '', '', '')
  ('', datetime.datetime(2015, 6, 30, 17, 46), datetime.datetime(1970, 1, 1, 0, 0), '', '', '', '')
  ('', datetime.datetime(2015, 6, 30, 17, 55), datetime.datetime(1970, 1, 1, 0, 0), '', '', '', '')
  ('', 7741528753124368710L, datetime.datetime(1982, 11, 21, 6, 33), '', '', '', '')
  ('', 7959953343691844691L, datetime.datetime(1970, 1, 1, 0, 0), '', '', '', '')
  ('', datetime.datetime(5205, 7, 21, 7, 42), datetime.datetime(1970, 1, 1, 0, 0), '', '', '', '')
  ('', 2336635297857499728L, 2338042681633169744L, '', '', '', '')]]

What can I do to resolve this?

like image 495
Mark Omo Avatar asked Jul 01 '15 22:07

Mark Omo


1 Answers

Firstly, fields in a structured array are not the same thing as dimensions in a regular ndarray. You want your Ticket_label array to be 1-dimensional, but for each row element in that dimension to contain 7 fields, e.g.:

Ticket_data = np.empty((0,),
                       dtype='str,datetime64[m],datetime64[m],str,str,str,str')

Now in order to concatenate lineitem to Ticket_data, it must first be implicitly cast from nested lists to an array. Since you don't specify separate dtypes for each field, numpy treats lineitem as a homogeneous array, and finds a common dtype that each element can be safely promoted to.

For example:

lineitem = ['foo', np.datetime64('1979-03-22T19:00', 'm'),
            np.datetime64('1979-03-22T19:00', 'm'), 'bar', 'baz', 'a', 'b']

np.array(lineitem)
# array(['21539', '2015-06-30T10:46-0700', '2015-06-30T10:55-0700',
#        'Testtext', 'Testtext2', 'Testtext3', 'Testtext5'], 
#       dtype='|S21')

In this example, every element is cast to a 21-long string. The dtype of this array does not match that of Ticket_data, and since there is no safe way to cast '|S21' to 'np.datetime64[m]' you get an invalid type promotion error.

You could avoid the error by explicitly casting lineitem to an array, specifying the correct dtypes for each field:

np.array([tuple(lineitem)], dtype=Ticket_data.dtype)

Note that I'm casting lineitem to a tuple - this is necessary in order for the elements in lineitem to be interpreted as separate fields rather than separate elements. The result is an array of shape (1,) (not (1, 7)):

np.array([tuple(lineitem)], dtype=Ticket_data.dtype).shape
# (1,)

If I don't cast lineitem to a tuple then I get a (1, 7) array, where each individual element in lineitem is interpreted as a sequence of 'str,datetime64[m],datetime64[m],str,str,str,str', resulting in the nonsense you showed in your edit.

The result can then be concatenated to Ticket_label.


As an aside, I strongly recommend using pandas instead of structured arrays for dealing with heterogeneous data such as this.

like image 161
ali_m Avatar answered Sep 21 '22 05:09

ali_m