Keras predict() returns a better accuracy than evaluate()

Tags:

I set up a model with Keras, then I trained it on a dataset of 3 records and finally I tested the resulting model with evaluate() and predict(), using the same test set for both functions (the test set has 100 records and it doesn't have any record of the training set, as much as it can be relevant, given the size of the two datasets). The dataset is composed by 5 files, where 4 files represent each one a different temperature sensor, that each minute collects 60 measurements (each row contains 60 measurements), while the last file contains the class labels that I want to predict (in particular, 3 classes: 3, 20 or 100).

This is the model I'm using:

n_sensors, t_periods = 4, 60

model = Sequential()

model.add(Conv1D(100, 6, activation='relu', input_shape=(t_periods, n_sensors)))

model.add(Conv1D(100, 6, activation='relu'))

model.add(MaxPooling1D(3))

model.add(Conv1D(160, 6, activation='relu'))

model.add(Conv1D(160, 6, activation='relu'))

model.add(GlobalAveragePooling1D())

model.add(Dropout(0.5))

model.add(Dense(3, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

That I train: self.model.fit(X_train, y_train, batch_size=3, epochs=5, verbose=1)

Then I use evaluate: self.model.evaluate(x_test, y_test, verbose=1)

And predict:

predictions = self.model.predict(data)
result = np.where(predictions[0] == np.amax(predictions[0]))
if result[0][0] == 0:
    return '3'
elif result[0][0] == 1:
    return '20'
else:
    return '100'

For each class predicted, I confront it with the actual label, and then I calculate correct guesses / total examples, that should be equivalent to accuracy from the evaluate() function. Here's the code:

correct = 0
for profile in self.profile_file: #profile_file is an opened file
    ts1 = self.ts1_file.readline()
    ts2 = self.ts2_file.readline()
    ts3 = self.ts3_file.readline()
    ts4 = self.ts4_file.readline()
    data = ts1, ts2, ts3, ts4
    test_data = self.dl.transform(data) # see the last block of code I posted
    prediction = self.model.predict(test_data)
    if prediction == label:
       correct += 1
acc = correct / 100 # 100 is the number of total examples

Data feeded to evaluate() is taken from this function:

label = pd.read_csv(os.path.join(self.testDir, 'profile.txt'), sep='\t', header=None)
label = np_utils.to_categorical(label[0].factorize()[0])
data = [os.path.join(self.testDir,'TS2.txt'),os.path.join(self.testDir, 'TS1.txt'),os.path.join(self.testDir,'TS3.txt'),os.path.join(self.testDir, 'TS4.txt')]
df = pd.DataFrame()
for txt in data:
    read_df = pd.read_csv(txt, sep='\t', header=None)
    df = df.append(read_df)
df = df.apply(self.__predict_scale)
df = df.sort_index().values.reshape(-1,4,60).transpose(0,2,1)
return df, label

While data feeded to predict() is taken from this other one:

df = pd.DataFrame()
for txt in data: # data 
    read_df = pd.read_csv(StringIO(txt), sep='\t', header=None)
    df = df.append(read_df)
df = df.apply(self.__predict_scale)
df = df.sort_index().values.reshape(-1,4,60).transpose(0,2,1)
return df

Accuracies yielded by evaluate() and predict() are always different: in particular, the maximum difference I noted was when evaluate() resulted in a 78% accuracy while predict() in a 95% accuracy. The only difference between the two functions is that I make predict() work on an example at a time, while evaluate() takes the entire dataset all at once, but it should result in no difference. How can it be?

UPDATE 1: It seems that the problem is in how I prepare my data. In the case of predict(), I transform only one line at a time from each file using the last block of code I posted, while in feeding evaluate(), I transform the entire files using the other function reported. Why should it be different? It seems to me that I'm applying the exact same transformation, the only difference is in the number of rows transformed.

474

asked Sep 09 '19 23:09

DDD

Video Answer

1 Answers

This question was already answered here

what happens is when you evaluate the model, since your loss function is categorical_crossentropy, metrics=['accuracy'] calculates categorical_accuracy.

But predict has a default set to binary_accuracy.

So essentially you are calculating categorical accuracy with evaluate and and binary accuracy with predict. this is the reason they are so widely different.

the difference between categorical_accuracy and binary_accuracy is that categorical_accuracy check if all the outputs match with your y_test and binary_accuracy checks if each of you outputs matches with your y_test.

Example(single row):

prediction = [0,0,1,1,0]
y_test = [0,0,0,1,0]

categorical_accuracy = 0%

since 1 output does not match the categorical_accuracy is 0

binary_accuracy = 80%

even though 1 output doesn't match the rest of 80% do match so accuracy is 80%

answered Nov 14 '22 23:11

sparkles

Related questions
                            
                                Mixing tornado and sqlalchemy
                            
                                Can't set font size and rtl
                            
                                Visual Studio Code - Can you have real-time linting for python?
                            
                                Right usage of second argument in proxy
                            
                                Python Plotly in Power BI
                            
                                cannot unpack non-iterable numpy.float64 object python3 opencv
                            
                                ValueError: Unknown layer:name when loading a keras model
                            
                                How to install Plotly for Python 3 Jupyter Notebook?
                            
                                tensorflow gradient - getting all nan values
                            
                                Sympy - Rename part of an expression
                            
                                How to rearrange an Ordered Dictionary with a based on part of the key from a list
                            
                                Error pickling a `matlab` object in joblib `Parallel` context
                            
                                What does distutils do with the "requires" metadata?
                            
                                Auto Import and Refactor (Move) function from one file to another in vscode
                            
                                dataclasses: how to ignore None values using asdict()?
                            
                                Is there a numerically optimal order of matrix multiplication?
                            
                                How to configure pytest to avoid collection failure on missing imports?
                            
                                Different ways of getting Ethereum txpool pending transactions at Infura node via Web3.py
                            
                                How to return dictonary or json if I use psycopg2?
                            
                                'dict' object has no attribute 'pk' when using Django bulk_create() function

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Keras predict() returns a better accuracy than evaluate()

Tags:

python

machine-learning

numpy

tensorflow

keras

DDD

People also ask

Video Answer

1 Answers

sparkles

Recent Activity

Donate For Us