Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

using ggpairs with NA-continaing data

Tags:

r

ggplot2

ggpairs in the GGally package seems pretty useful, but it appears to fail when there NA is present anywhere in the data set:

#require(GGally)
data(tips, package="reshape")
pm <- ggpairs(tips[,1:3]) #works just fine

#introduce NA
tips[1,1] <- NA
ggpairs(tips[,1:3])
> Error in if (lims[1] > lims[2]) { : missing value where TRUE/FALSE needed

I don't see any documentation for dealing with NA values, and solutions like ggpairs(tips[,1:3], na.rm=TRUE) (unsurprisingly) don't change the error message.

I have a data set in which perhaps 10% of values are NA, randomly scattered throughout the dataset. Therefore na.omit(myDataSet) will remove much of the data. Is there any way around this?

like image 396
Drew Steen Avatar asked Oct 26 '12 20:10

Drew Steen


1 Answers

Some functions of GGally like ggparcoord() support handling NAs by missing=[exclude,mean,median,min10,random] parameter. However this is not the case for ggpairs() unfortunately.

What you can do is to replace NAs with a good estimation of your data you were expecting ggpair() will do automatically for you. There are good solutions like replacing them by row means, zeros, median or even closest point (Notice 4 hyperlinks on the words of the recent sentence!).

like image 183
Ali Avatar answered Oct 15 '22 06:10

Ali