fodop

Root-mean-square error (RMSE) is a frequently used measure of the differences between values predicted by a model and the values actually observed.It represents the sample standard deviation of the differences between predicted values and observed values. The RMSE serves to aggregate the magnitudes of the errors in predictions for various times into a single measure of predictive power.


How does one calculate RMSE


RMSE can be calculated by below formula.

rmse4.gif


The following is the code to calculate RMSE in different languages & statistical packages:


R

The function rmse  from hydroGOF package can be used to calculate RMSE between two values sim and obs


rmse(sim, obs)


Python

statsmodels module has a function to calculate rmse

statsmodels.tools.eval_measures.rmse


sklearn.metrics has a mean_squared_error function

RMSE is just the square root or mean squared error


Matlab

RMSE = sqrt(mean((y-y_pred).^2))



The primary advantage of using RMSE is that is widely used because its very simple to explain to anybody. Everybody can understand what it really means. The result RMSE produces is of the same units as the variable being modelled. RMSE is very useful in applications one is concerned about the ability of the model to predict values of the variable in an absolute sense. In other words: how well the predicted values correspond to the actual.


There are two main problems with exclusively focusin on RMSE to evaluate a model’s effectiveness. Firstly. The root mean squared error is more sensitive than other measures to the outliers with heavy weights. Because the errors are squared the large errors are weighted more heavily than the smaller ones. RMSE can be quite misleading depending upon the data or if the error distribution is not proper.


The second big danger is optimizing the models to get better RMSE will compromise the greater goal. For example, this comment highlights the issues that Netflix had with using RMSE for evaluating who to give the prize:

Consider the 2008 Netflix Prize contest. Netflix offered a reward to any team able to improve upon the RMSE of their recommender system.The winning team used a complex grouping of 107 different models to arrive at an RMSE of 0.8567 which was an improvement of around 10% from the Netflix's algorithm. It was later learned, that, from a fully loaded cost perspective, the actual improvement over their model was a 0.005% reduction in the real 5 point rating scale. The IT costs to maintain the complex combination 107 models did not make any sense if the actual reduction is error is considered.  


So RMSE should not be the only metric to evaluate an algorithm. From the Business perspective the algorithm evaluation should also consider the IT & operational costs to achieve the improvement in statistical considerations like RMSE and the strategic gains made by spending that money.  Analysts that have domain expertise and knowledge about how the businesses functions and where the analytics generated will go will help really help with respect to achieving the greater goal than blindly optimizing the models to get better RMSE.