Loading web-font TeX/Math/Italic

Thursday, March 7, 2013

RMSD and MUE versus Correlation Coefficient: a simple illustration of the difference

Here is a simple illustration of what the root-mean-square deviation (RMSD), mean-unsigned error (MUE) and correlation coefficient (r) can tell you about your data.   Imagine that the x-axis is experimental data and the y-axis is computed data in some arbitrary units. (you can access the data here).

The blue dots represent a perfect correlation (y=x) for which r = 1 and RMSD = MUE = 0.

The red dots represent the function y=2x-5.  The RMSD = 2.9 and MUE = 2.5, and both  would seem to indicate a pretty crappy model.  However, r = 1.0 indicating that there is a systematic error that can be fixed completely by a linear fit.  In this case, the MUE is an indicator of part of this systematic error that can be fixed by an offset [x=\frac{1}{2}y+2.5].

The orange dots do not represent a linear function and clearly represent a worse model than red dots.  However, the RMSD = 2.6 and MUE = 2.1 are both slightly better the red model.  But, r = 0.7 indicating that only part of the discrepancy can be fixed by a linear fit.

Indeed, a linear fit to the orange data (x=1.2y-2.75) can only reduce the RMSD and MUE to 2.1 and 1.8, respectively.

The relationship between r and the RMSD after a linear fit isRMSD_{fit}=\sigma_x\sqrt{1-r^2}
where \sigma_x is the standard deviation of the experimental data, which in this case is 2.9.  So knowing r and \sigma_x tells one immediately what the lowest possible RMSD value for a model is using a linear fit.

Also, you can think of \sigma_x as the RMSD for the very simple model y=\langle x \rangle, i.e. the model simply returns the average value of the experimental data.  This is the maximum RMSD value for a linear fit (where r = 0).

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License

No comments: