Tuesday, May 29, 2012

Locate Residuals on Residual Plots in R

ANOVA is one of the best techniques to analyze variance. In R, although we do ANOVA, it is in general linear model that goes inside. In fact, most of the statistics is all about GLM or generalized linear models, ANOVA, MANOVA, Time series, Regression are, in fact, very few that use linear model calculations in R. I mean to say, the underlying mechanism to these techniques is non other than linear models.

In linear models, i.e., when we we do regression, ANOVA or MANOVA, it is all about residuals nothing else. Do you all accept! just finding F value and after p-value is not all about linear models, but real analysis starts when we shift our concentration to residuals (errors).

I like R for many reasons, one of the reasons is being it is better than any other statistical software not only in reliability but also due to its strong visualization stamina. O kay, to cut to short, let me ask you a question, that if I would like to know what is value of residual on residual plot, what do I need to do? In fact, there is a beautiful command to mark a residual value on residual plots in R.

Say for instance, if I am doing some lm calculations in R and I am to plot residuals in graph, that I can do by executing following code:


> plot(fitted.values(linuxglm), residuals(linuxglm), xlab = "Fitted Values", ylab = "Residuals", col = "red"); abline(h=0)
 
In above code, linuxglm refers to my linear model (I mean fit). Now the plot can look like below:
 I am not very much satisfied about visualization, I mean, I would like to mine little more about the data, if I have values along with points ( at the same time I don't like to know the values for each and every point, which makes the graph little more messy). As far as this plot is concerned there are two points one at the middle top of the graph, the other at right bottom (perhaps I can call them outliars). I just would like to know about residual values of these points. Now, it can be know by executing following line:
 
> identify(fitted.values(linuxglm), residuals(linuxglm), xlab = "Fitted Values", ylab = "Residuals", col = "red"); abline(h=0)
 
the moment we press return the R engine becomes busy, after some time, it show as message as locator active (Escape to finish) in R Studio. Don't press esc, first choose to which points you would like to know residual values, then press esc botton. The result will be in terminal:
 
[1]   4 492
 
and teh resultant graph will be:
 
  

No comments:

Post a Comment