TypeError: ufunc 'subtract' did not contain a loop with signature matching types dtype('

Question

Strange error from numpy via matplotlib when trying to get a histogram of a tiny toy dataset. I'm just not sure how to interpret the error, which makes it hard to see what to do next.

Didn't find much related, though this nltk question and this gdsCAD question are superficially similar.

I intend the debugging info at bottom to be more helpful than the driver code, but if I've missed something, please ask. This is reproducible as part of an existing test suite.

        if n > 1:
            return diff(a[slice1]-a[slice2], n-1, axis=axis)
        else:
>           return a[slice1]-a[slice2]
E           TypeError: ufunc 'subtract' did not contain a loop with signature matching types dtype('<U1') dtype('<U1') dtype('<U1')

../py2.7.11-venv/lib/python2.7/site-packages/numpy/lib/function_base.py:1567: TypeError
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> py2.7.11-venv/lib/python2.7/site-packages/numpy/lib/function_base.py(1567)diff()
-> return a[slice1]-a[slice2]
(Pdb) bt
[...]
py2.7.11-venv/lib/python2.7/site-packages/matplotlib/axes/_axes.py(5678)hist()
-> m, bins = np.histogram(x[i], bins, weights=w[i], **hist_kwargs)
  py2.7.11-venv/lib/python2.7/site-packages/numpy/lib/function_base.py(606)histogram()
-> if (np.diff(bins) < 0).any():
> py2.7.11-venv/lib/python2.7/site-packages/numpy/lib/function_base.py(1567)diff()
-> return a[slice1]-a[slice2]
(Pdb) p numpy.__version__
'1.11.0'
(Pdb) p matplotlib.__version__
'1.4.3'
(Pdb) a
a = [u'A' u'B' u'C' u'D' u'E']
n = 1
axis = -1
(Pdb) p slice1
(slice(1, None, None),)
(Pdb) p slice2
(slice(None, -1, None),)
(Pdb)

Muhammad Umar Amanat · Accepted Answer

I got the same error, but in my case I am subtracting dict.key from dict.value. I have fixed this by subtracting dict.value for corresponding key from other dict.value.

cosine_sim = cosine_similarity(e_b-e_a, w-e_c)

here I got error because e_b, e_a and e_c are embedding vector for word a,b,c respectively. I didn't know that 'w' is string, when I sought out w is string then I fix this by following line:

cosine_sim = cosine_similarity(e_b-e_a, word_to_vec_map[w]-e_c)

Instead of subtracting dict.key, now I have subtracted corresponding value for key

colster · Answer

I had a similar issue where an integer in a row of a DataFrame I was iterating over was of type numpy.int64. I got the

TypeError: ufunc 'subtract' did not contain a loop with signature matching types dtype('<U1') dtype('<U1') dtype('<U1')

error when trying to subtract a float from it.

The easiest fix for me was to convert the row using pd.to_numeric(row).

hpaulj · Answer

Why is it applying diff to an array of strings.

I get an error at the same point, though with a different message

In [23]: a=np.array([u'A' u'B' u'C' u'D' u'E'])

In [24]: np.diff(a)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-24-9d5a62fc3ff0> in <module>()
----> 1 np.diff(a)

C:\Users\paul\AppData\Local\Enthought\Canopy\User\lib\site-packages
umpy\lib\function_base.pyc in diff(a, n, axis)
   1112         return diff(a[slice1]-a[slice2], n-1, axis=axis)
   1113     else:
-> 1114         return a[slice1]-a[slice2]
   1115 
   1116 

TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'numpy.ndarray'

Is this a array the bins parameter? What does the docs say bins should be?

James · Answer

I am fairly new to this myself, but I had a similar error and found that it is due to a type casting issue. I was trying to concatenate rather than take the difference but I think the principle is the same here. I provided a similar answer on another question so I hope that is OK.

In essence you need to use a different data type cast, in my case I needed str not float, I suspect yours is the same so my suggested solution is. I am sorry I cannot test it before suggesting but I am unclear from your example what you were doing.

return diff(str(a[slice1])-str(a[slice2]), n-1, axis=axis)

Please see my example code below for the fix to my code, the change occurs on the third to last line. The code is to produce a basic random forest model.

import scipy
import math
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn import preprocessing, metrics, cross_validation

Data = pd.read_csv("Free_Energy_exp.csv", sep=",")
Data = Data.fillna(Data.mean()) # replace the NA values with the mean of the descriptor
header = Data.columns.values # Ues the column headers as the descriptor labels
Data.head()
test_name = "Test.csv"

npArray = np.array(Data)
print header.shape
npheader = np.array(header[1:-1])
print("Array shape X = %d, Y = %d " % (npArray.shape))
datax, datay =  npArray.shape

names = npArray[:,0]
X = npArray[:,1:-1].astype(float)
y = npArray[:,-1] .astype(float)
X = preprocessing.scale(X)

XTrain, XTest, yTrain, yTest = cross_validation.train_test_split(X,y, random_state=0)

# Predictions results initialised 
RFpredictions = []
RF = RandomForestRegressor(n_estimators = 10, max_features = 5, max_depth = 5, random_state=0)
RF.fit(XTrain, yTrain)       # Train the model
print("Training R2 = %5.2f" % RF.score(XTrain,yTrain))
RFpreds = RF.predict(XTest)

with open(test_name,'a') as fpred :
    lenpredictions = len(RFpreds)
    lentrue = yTest.shape[0]
    if lenpredictions == lentrue :
            fpred.write("Names/Label,, Prediction Random Forest,, True Value,
")
            for i in range(0,lenpredictions) :
                    fpred.write(RFpreds[i]+",,"+yTest[i]+",
")
    else :
            print "ERROR - names, prediction and true value array size mismatch."

This leads to an error of;

Traceback (most recent call last):
  File "min_example.py", line 40, in <module>
    fpred.write(RFpreds[i]+",,"+yTest[i]+",
")
TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('S32') dtype('S32') dtype('S32')

The solution is to make each variable a str() type on the third to last line then write to file. No other changes to then code have been made from the above.

import scipy
import math
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn import preprocessing, metrics, cross_validation

Data = pd.read_csv("Free_Energy_exp.csv", sep=",")
Data = Data.fillna(Data.mean()) # replace the NA values with the mean of the descriptor
header = Data.columns.values # Ues the column headers as the descriptor labels
Data.head()
test_name = "Test.csv"

npArray = np.array(Data)
print header.shape
npheader = np.array(header[1:-1])
print("Array shape X = %d, Y = %d " % (npArray.shape))
datax, datay =  npArray.shape

names = npArray[:,0]
X = npArray[:,1:-1].astype(float)
y = npArray[:,-1] .astype(float)
X = preprocessing.scale(X)

XTrain, XTest, yTrain, yTest = cross_validation.train_test_split(X,y, random_state=0)

# Predictions results initialised 
RFpredictions = []
RF = RandomForestRegressor(n_estimators = 10, max_features = 5, max_depth = 5, random_state=0)
RF.fit(XTrain, yTrain)       # Train the model
print("Training R2 = %5.2f" % RF.score(XTrain,yTrain))
RFpreds = RF.predict(XTest)

with open(test_name,'a') as fpred :
    lenpredictions = len(RFpreds)
    lentrue = yTest.shape[0]
    if lenpredictions == lentrue :
            fpred.write("Names/Label,, Prediction Random Forest,, True Value,
")
            for i in range(0,lenpredictions) :
                    fpred.write(str(RFpreds[i])+",,"+str(yTest[i])+",
")
    else :
            print "ERROR - names, prediction and true value array size mismatch."

These examples are from a larger code so I hope the examples are clear enough.

TypeError: ufunc 'subtract' did not contain a loop with signature matching types dtype('<U1') dtype('<U1') dtype('<U1')

Tags:

matplotlib

python-unicode

numpy

Gregory Marton

4 Answers

Muhammad Umar Amanat

colster

hpaulj

James

Recent Activity

Donate For Us