I have a data frame and I want to learn how the summary generates it's information. Specifically, how does summary generate a count for the number of elements in each level of a factor. I can use summary, but I want to learn how to work with factors better. When I try ?summary, I just get the general info. Is this impossible because it is in bytecode?
If you want to view the code built-in to the R interpreter, you will need to download/unpack the R sources; or you can view the sources online via the R Subversion repository or Winston Chang's github mirror.
You can open a new empty script by clicking the New File icon in the upper left of the main RStudio toolbar. This icon looks like a white square with a white plus sign in a green circle. Clicking the icon opens the New File Menu. Click the R Script menu option and the script editor will open with an empty script.
You put all your R code into the R folder. This folder may or may not have a nested folder structure itself. You probably have a folder for your data and one into which you store all results. The important part here is that you have split your code base into different files.
What we see when you type summary
is
> summary
function (object, ...)
UseMethod("summary")
<bytecode: 0x0456f73c>
<environment: namespace:base>
This is telling us that summary is a generic function and has many methods attached to it. To see what those methods are actually called we can try
> methods(summary)
[1] summary.aov summary.aovlist summary.aspell*
[4] summary.connection summary.data.frame summary.Date
[7] summary.default summary.ecdf* summary.factor
[10] summary.glm summary.infl summary.lm
[13] summary.loess* summary.manova summary.matrix
[16] summary.mlm summary.nls* summary.packageStatus*
[19] summary.PDF_Dictionary* summary.PDF_Stream* summary.POSIXct
[22] summary.POSIXlt summary.ppr* summary.prcomp*
[25] summary.princomp* summary.srcfile summary.srcref
[28] summary.stepfun summary.stl* summary.table
[31] summary.tukeysmooth*
Non-visible functions are asterisked
Here we see all the methods associated with the summary
function. What this means is that there is different code for when you call summary on an lm object than there is when you call summary on a data.frame. This is good because we wouldn't expect the summary to be conducted the same way for those two objects.
To see the code that is run when you call summary on a data.frame you can just type
summary.data.frame
as shown in the methods list. You'll be able to examine it and study it and do whatever you want with the printed code. You mentioned that you were interested in factors so you will probably want to examine the output of summary.factor
. Now you might notice that some of the methods printed had an asterisk (*) next to them which implies that they're non-visible. This essentially means that you can't just type the name of the function to try to view the code.
> summary.prcomp
Error: object 'summary.prcomp' not found
However, if you're determined to see what the code actually is you can use the getAnywhere
function to view it.
> getAnywhere(summary.prcomp)
A single object matching ‘summary.prcomp’ was found
It was found in the following places
registered S3 method for summary from namespace stats
namespace:stats
with value
function (object, ...)
{
vars <- object$sdev^2
vars <- vars/sum(vars)
importance <- rbind(`Standard deviation` = object$sdev, `Proportion of Variance` = round(vars,
5), `Cumulative Proportion` = round(cumsum(vars), 5))
colnames(importance) <- colnames(object$rotation)
object$importance <- importance
class(object) <- "summary.prcomp"
object
}
<bytecode: 0x03e15d54>
<environment: namespace:stats>
Hopefully this helps you explore the code in R much more easily in the future.
For even more details you can view Volume 6/4 of The R Journal (warning, pdf) and read Uwe Ligge's "R Help Desk" section which deals with viewing the source code of R functions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With