Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R function that parses output printed to console and returns meaningful object

Tags:

r

It sometimes occurs in R that a function will print output to the console that is never returned as an object. For example, factanal prints the proportion of variance explained as well as a loading matrix with hidden loadings, but none of these things are directly available to the user as returned objects. While in some cases, there are special extractor functions, often you have to go onto a search into the relevant print method and try to extract the relevant code that generates the information, if you want to use it in subsequent analyses.

Thus, I was curious whether there was a general function that is able to parse printed output in R. Ideally such a function would recognise when a matrix/data.frame is being printed and return that as one element of a list in a data.frame or matrix format.

To make this specific, I use the factanal function. If we run the following code:

fit <- factanal(matrix(rnorm(1000), ncol = 5), 2)
fit

we get the following output:

Call:
factanal(x = matrix(rnorm(1000), ncol = 5), factors = 2)

Uniquenesses:
[1] 0.005 0.990 0.994 0.807 0.915

Loadings:
     Factor1 Factor2
[1,]  0.974   0.216 
[2,]                
[3,]                
[4,]          0.429 
[5,]          0.289 

               Factor1 Factor2
SS loadings      0.963   0.327
Proportion Var   0.193   0.065
Cumulative Var   0.193   0.258

Test of the hypothesis that 2 factors are sufficient.
The chi square statistic is 0.27 on 1 degree of freedom.
The p-value is 0.601 

There would be various ways of parsing the above text and returning it as a meaningful object. One option would simply be to return character vector of one line per row. A more sophisticated approach would return a list where each element is split where there is an empty line, and it would somehow recognise where a matrix is being presented. It's quite clear that "Loadings:..." and where it says "Factor1, Factor2" is the beginning of a matrix.

Question

  • Is there a general purpose function in R that can be used to extract the elements of printed output?
  • Alternatively, is there a general approach that works well in extracting output from output printed to the console?

For concreteness, I would be interested in how such approaches could be applied to the specific case above applied to extracting the content from a factor analysis.

like image 737
Jeromy Anglim Avatar asked Sep 10 '25 13:09

Jeromy Anglim


1 Answers

What follows is a partial answer using Jota's recommendation of capture.output. It also draws on readr::read_fwf and readr::fwf_empty to intelligently extract the fixed-width data.

So we fit a factor analysis:

fit <- factanal(matrix(rnorm(1000), ncol = 5), 2)

We then capture the output with capture.output().

all_output <- capture.output(fit)

Here's an abbreviated version of what all_output looks like (i.e., a character vector with one line per element of the vector):

 [1] ""                                                        
 [2] "Call:"                                                   
 [3] "factanal(x = matrix(rnorm(1000), ncol = 5), factors = 2)"
...                                
[15] ""                                                        
[16] "               Factor1 Factor2"                          
[17] "SS loadings      1.055   0.363"                          
[18] "Proportion Var   0.211   0.073"                          
[19] "Cumulative Var   0.211   0.284"                          

Then we can extract the relevant lines, combine into single vector with line separator as expected by read_fwf and use fwf_empty to guess where the fields are divided:

x <- all_output[16:19] # extract lines with data.frame output
xc <- paste(x, collapse = "\n") # combine into a single character vector with new line
readr::read_fwf(xc, readr::fwf_empty(xc), ) # use fwf_empty to auto-detect widths

This returns a data frame:

# A tibble: 4 × 3
              X1      X2      X3
           <chr>   <chr>   <chr>
1           <NA> Factor1 Factor2
2    SS loadings   1.055   0.363
3 Proportion Var   0.211   0.073
4 Cumulative Var   0.211   0.284

Probably a more sophisticated answer would:

  • resolve the issue that data.frame names are not correct
  • automatically parse the output into discrete elements perhaps based on the presence of empty lines
  • perhaps auto-detect when the first column is meant to be row names
  • auto-detect where a matrix/data.frame has been printed perhaps based on the presence of rows with equal numbers of characters: e.g., nchar(capture.output(fit))
like image 161
Jeromy Anglim Avatar answered Sep 13 '25 04:09

Jeromy Anglim