It sometimes occurs in R that a function will print output to the console that is never returned as an object. For example, factanal
prints the proportion of variance explained as well as a loading matrix with hidden loadings, but none of these things are directly available to the user as returned objects. While in some cases, there are special extractor functions, often you have to go onto a search into the relevant print method and try to extract the relevant code that generates the information, if you want to use it in subsequent analyses.
Thus, I was curious whether there was a general function that is able to parse printed output in R. Ideally such a function would recognise when a matrix/data.frame is being printed and return that as one element of a list in a data.frame or matrix format.
To make this specific, I use the factanal
function. If we run the following code:
fit <- factanal(matrix(rnorm(1000), ncol = 5), 2)
fit
we get the following output:
Call:
factanal(x = matrix(rnorm(1000), ncol = 5), factors = 2)
Uniquenesses:
[1] 0.005 0.990 0.994 0.807 0.915
Loadings:
Factor1 Factor2
[1,] 0.974 0.216
[2,]
[3,]
[4,] 0.429
[5,] 0.289
Factor1 Factor2
SS loadings 0.963 0.327
Proportion Var 0.193 0.065
Cumulative Var 0.193 0.258
Test of the hypothesis that 2 factors are sufficient.
The chi square statistic is 0.27 on 1 degree of freedom.
The p-value is 0.601
There would be various ways of parsing the above text and returning it as a meaningful object. One option would simply be to return character vector of one line per row. A more sophisticated approach would return a list where each element is split where there is an empty line, and it would somehow recognise where a matrix is being presented. It's quite clear that "Loadings:..." and where it says "Factor1, Factor2" is the beginning of a matrix.
For concreteness, I would be interested in how such approaches could be applied to the specific case above applied to extracting the content from a factor analysis.
What follows is a partial answer using Jota's recommendation of capture.output
. It also draws on readr::read_fwf
and readr::fwf_empty
to intelligently extract the fixed-width data.
So we fit a factor analysis:
fit <- factanal(matrix(rnorm(1000), ncol = 5), 2)
We then capture the output with capture.output()
.
all_output <- capture.output(fit)
Here's an abbreviated version of what all_output
looks like (i.e., a character vector with one line per element of the vector):
[1] ""
[2] "Call:"
[3] "factanal(x = matrix(rnorm(1000), ncol = 5), factors = 2)"
...
[15] ""
[16] " Factor1 Factor2"
[17] "SS loadings 1.055 0.363"
[18] "Proportion Var 0.211 0.073"
[19] "Cumulative Var 0.211 0.284"
Then we can extract the relevant lines, combine into single vector with line separator as expected by read_fwf
and use fwf_empty
to guess where the fields are divided:
x <- all_output[16:19] # extract lines with data.frame output
xc <- paste(x, collapse = "\n") # combine into a single character vector with new line
readr::read_fwf(xc, readr::fwf_empty(xc), ) # use fwf_empty to auto-detect widths
This returns a data frame:
# A tibble: 4 × 3
X1 X2 X3
<chr> <chr> <chr>
1 <NA> Factor1 Factor2
2 SS loadings 1.055 0.363
3 Proportion Var 0.211 0.073
4 Cumulative Var 0.211 0.284
Probably a more sophisticated answer would:
nchar(capture.output(fit))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With