Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating boxplots in R using lattice for already processed data

Tags:

plot

r

lattice

I am trying to create a a boxplot in R of an extremely large data set. The file containing the data is 2.5G and crashes R if I try to import it. Fortunately some other piece of (python) software can generate the mean and variance without a problem, which is all I really want to plot(for now).

Every tutorial I've found so far requires you to input the full data set, then R computes the statistics itself, but I was wondering how to pass the mean, median, min, max, etc... to bwplot just for plotting. The reason I prefer R and lattice is because it integrates well with the software suite the code might end up in. If I used matlab or some other software that would be a problem because it would be yet another requirement from our current users.

like image 678
Paul Avatar asked Jan 22 '26 01:01

Paul


1 Answers

Boxplots do not plot mean or variance. You actually need the full ranked data to plot a proper boxplot, because the quantities are median, quartiles and the actual value of the closes data point within 1.5 times IRQ plus all data points that are outside that range (outliers). This is typically not a good idea for a large data set (because by definition you have millions of outliers).

That said, you can generate the essential summaries any way you want and use bxp to plot them - see ?bxp in R. Just make sure you clarify what quantities you are plotting if they are not the above.

like image 151
Simon Urbanek Avatar answered Jan 24 '26 17:01

Simon Urbanek