Here is a simple illustration of what the root-mean-square deviation (RMSD), mean-unsigned error (MUE) and correlation coefficient (r) can tell you about your data. Imagine that the x-axis is experimental data and the y-axis is computed data in some arbitrary units. (you can access the data here).
The blue dots represent a perfect correlation (y=x) for which r = 1 and RMSD = MUE = 0.
The red dots represent the function y=2x-5. The RMSD = 2.9 and MUE = 2.5, and both would seem to indicate a pretty crappy model. However, r = 1.0 indicating that there is a systematic error that can be fixed completely by a linear fit. In this case, the MUE is an indicator of part of this systematic error that can be fixed by an offset [x=\frac{1}{2}y+2.5].
The orange dots do not represent a linear function and clearly represent a worse model than red dots. However, the RMSD = 2.6 and MUE = 2.1 are both slightly better the red model. But, r = 0.7 indicating that only part of the discrepancy can be fixed by a linear fit.
Indeed, a linear fit to the orange data (x=1.2y-2.75) can only reduce the RMSD and MUE to 2.1 and 1.8, respectively.
The relationship between r and the RMSD after a linear fit isRMSD_{fit}=\sigma_x\sqrt{1-r^2}
Also, you can think of \sigma_x as the RMSD for the very simple model y=\langle x \rangle, i.e. the model simply returns the average value of the experimental data. This is the maximum RMSD value for a linear fit (where r = 0).

This work is licensed under a Creative Commons Attribution 3.0 Unported License.
The blue dots represent a perfect correlation (y=x) for which r = 1 and RMSD = MUE = 0.
The red dots represent the function y=2x-5. The RMSD = 2.9 and MUE = 2.5, and both would seem to indicate a pretty crappy model. However, r = 1.0 indicating that there is a systematic error that can be fixed completely by a linear fit. In this case, the MUE is an indicator of part of this systematic error that can be fixed by an offset [x=\frac{1}{2}y+2.5].
The orange dots do not represent a linear function and clearly represent a worse model than red dots. However, the RMSD = 2.6 and MUE = 2.1 are both slightly better the red model. But, r = 0.7 indicating that only part of the discrepancy can be fixed by a linear fit.
Indeed, a linear fit to the orange data (x=1.2y-2.75) can only reduce the RMSD and MUE to 2.1 and 1.8, respectively.
The relationship between r and the RMSD after a linear fit isRMSD_{fit}=\sigma_x\sqrt{1-r^2}
where \sigma_x is the standard deviation of the experimental data, which in this case is 2.9. So knowing r and \sigma_x tells one immediately what the lowest possible RMSD value for a model is using a linear fit.
Also, you can think of \sigma_x as the RMSD for the very simple model y=\langle x \rangle, i.e. the model simply returns the average value of the experimental data. This is the maximum RMSD value for a linear fit (where r = 0).

This work is licensed under a Creative Commons Attribution 3.0 Unported License.
No comments:
Post a Comment