I am trying compute a metric with panda dataframes. In particular, I get a results object
prediction = results.predict(start=1,end=len(test),exog=test)
The actual values are in a dataframe given by
test['actual'].
I need to compute two things:
How can I compute the sum of squares of errors? So basically, I would be doing an element by element subtraction and then summing the squares of these.
How can I compute the sum of squares of the predicted minus the mean of the actual values? So it would be
(x1-mean_actual)^2 + (x2-mean_actual)^2...+(xn-mean_actual)^2
A Percentage is calculated by the mathematical formula of dividing the value by the sum of all the values and then multiplying the sum by 100. This is also applicable in Pandas Dataframes. Here, the pre-defined sum() method of pandas series is used to compute the sum of all the values of a column.
Summarizing Data The describe() function computes a summary of statistics pertaining to the DataFrame columns. This function gives the mean, std and IQR values. And, function excludes the character columns and given summary about numeric columns.
Due to parallel execution on all cores on multiple machines, PySpark runs operations faster than Pandas, hence we often required to covert Pandas DataFrame to PySpark (Spark with Python) for better performance. This is one of the major differences between Pandas vs PySpark DataFrame.
First one would be
((prediction - test['actual']) ** 2).sum()
Second one would be:
((prediction - test['actual'].mean()) ** 2).sum()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With