There appear to be similar questions to this in other languages but I can't find one in R.
I have a number of text files in the subdirectories of a directory; they all have the extension (.log) and they contain a mixture of text and data. I want to extract a couple of lines from these relatively large files.
For example, one file goes as follows ...
blahblahblah
NUMBER OF CARTESIAN GAUSSIAN BASIS FUNCTIONS = 210
blahblahblah
----------------------------------------<br />
CPU timing information for all processes<br />
========================================<br />
0: 8853.469 + 133.948 = 8987.417<br />
1: 8850.817 + 126.587 = 8977.405<br />
2: 8851.925 + 128.576 = 8980.501<br />
3: 8847.992 + 125.871 = 8973.864<br />
----------------------------------------<br />
ddikick.x: exited gracefully.<br />
blahblahblah
I want to harvest the number of basis functions (210 in this example) and the total amount of CPU times.
The line "NUMBER OF CARTESIAN GAUSSIAN BASIS FUNCTIONS =" is unique to each file; ie, if I open the file in a text editor and search using this string, I only return this one line. Similarly for "CPU timing information for all processes" and "exited gracefully".
I appreciate that it appears that I haven't done a lot to help myself but I just don't know where to start. If someone could point me in the right direction, I hope to be able to fill in the rest.
After the help given to me by @Ben (see below) here is the code that I ended up using,
filesearch <- function (x) {
f <- readLines(x)
cline <- grep("NUMBER OF CARTESIAN GAUSSIAN BASIS FUNCTIONS",f,
value=TRUE)
val <- as.numeric(str_extract(cline,"[0-9]+$"))
coline <- grep("^ +CPU timing information", f)
numstr <- sapply(str_extract_all(f[coline+2:5],"[0-9.]+"),as.numeric)
cline1 <- sum(numstr[4,])/60
output <- c(val, cline1)
return(cat(output,"\n"))
}
I sourced this function and keyed in the file that I needed each time, then I transferred the two results to another file by hand. Not as elegant as I'd like but it saved me a lot of time doing it this way. Thanks again to @Ben.
You can import data from a text file into an existing worksheet. Click the cell where you want to put the data from the text file. On the Data tab, in the Get External Data group, click From Text. In the Import Data dialog box, locate and double-click the text file that you want to import, and click Import.
Open the Excel spreadsheet where you want to save the data and click the Data tab. In the Get External Data group, click From Text. Select the TXT or CSV file you want to convert and click Import.
maybe
library(stringr)
f <- readLines("datafile.txt")
cline <- grep("NUMBER OF CARTESIAN GAUSSIAN BASIS FUNCTIONS",f,
value=TRUE)
val <- as.numeric(str_extract(cline,"[0-9]+$"))
will work?
To get the other values, try
cline <- grep("^ +CPU timing information",f)
(numstr <- sapply(str_extract_all(f[cline+2:5],"[0-9.]+"),as.numeric))
## [,1] [,2] [,3] [,4]
## [1,] 0.000 1.000 2.000 3.000
## [2,] 8853.469 8850.817 8851.925 8847.992
## [3,] 133.948 126.587 128.576 125.871
## [4,] 8987.417 8977.405 8980.501 8973.864
The sapply
has transposed the matrix of values, so the last row is the bit we want (corresponds to the last column in the file). Extract it using numstr[4,]
or numstr[nrow(numstr),]
or tail(numstr,1)
.
(edit: allow spaces before the "CPU timing" string) (edit: do it right!)
(To do this for all the log files, package it in a function and use list.files(pattern="\\.log$")
in combination with sapply
...)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With