I need to read a template file test.txt
, modify the contents and then write to disk a modified copy with name foo`i`.in
(i
is an iteration number). Since I need to perform this operation a large number of times (a million times wouldn't be uncommon), efficient solutions would be preferred. The template file is like this:
1
bar.out
70.000000000000000
2.000000000000000
14.850000000000000
8000.000000000000000
120.000000000000000
60.000000000000000
0.197500000000000
0.197500000000000
2.310000000000000
0.200000000000000
0.000000000000000
1.000000000000000
0.001187700000000
22.000000000000000
1.400000000000000
1.000000000000000
0.010000000000000
100
0.058600000000000
-0.217000000000000
0.078500000000000
-0.110100000000000
30
500.000000000000000
T
I don't need to modify all lines, just some of them. Specifically, I need to modify bar.out
to bar`i`.out
where i
is an iteration index. I also need to modify some numeric lines with the following values:
parameters <- data.frame(index = c(1:10, 13:16, 21:22), variable = c("P1",
"P2", "T1", "s", "D", "L", "C1", "C2", "VA",
"pw", "m", "mw", "Cp", "Z", "ff_N", "ff_M"),
value = c(65, 4, 16.85, 7900, 110, 60, 0.1975, .1875, 2.31,
0.2, 0.0011877, 22.0, 1.4, 1.0, 0.0785, -0.1101))
All the other lines must remain the same, including the last line T
. Thus, assuming I'm at the first iteration, the expected output is a text file named foo1.in
having the content (the exact number format is not important, as long as all the significant digits in parameters$value
are included in foo1.in
):
1
bar1.out
65.000000000000000
4.000000000000000
16.850000000000000
7900.000000000000000
110.000000000000000
60.000000000000000
0.197500000000000
0.187500000000000
2.310000000000000
0.200000000000000
0.000000000000000
1.000000000000000
0.001187700000000
22.000000000000000
1.400000000000000
1.000000000000000
0.010000000000000
100
0.058600000000000
-0.217000000000000
0.078500000000000
-0.110100000000000
30
500.000000000000000
T
Modifying foo.in
and bar.out
is easy:
template <- "test.txt"
infile <- "foo.in"
string1 <- "bar.out"
iteration <- 1
# build string1
elements <- strsplit(string1, "\\.")[[1]]
elements[1] <- paste0(elements[1], iteration)
string1 <- paste(elements, collapse = ".")
# build infile name
elements <- strsplit(infile, "\\.")[[1]]
elements[1] <- paste0(elements[1], iteration)
infile<- paste(elements, collapse = ".")
Now, I would like to read the template file and modify only the intended lines. The first problem I face is that read.table
only outputs a data frame. Since my template file contains numbers and strings in the same column, if I read all the file with read.table
I would obtain a character column (I guess). I circumvent the problem by reading only the numeric values I'm interested in:
# read template file
temp <- read.table(template, stringsAsFactors = FALSE, skip = 2, nrows = 23)$V1
lines_to_read <- temp[length(temp)]
# modify numerical parameter values
temp[parameters$index] <- parameters$value
However, now I don't know how to write foo1.in
. If I use write.table
, I can only write matrices or dataframes to disk, so I can't write a file which contains numbers and strings in the same column. How can I solve this?
EDIT I provide a bit of background on this problem, to explain why I need to write this file so many times. So, the idea is to perform Bayesian inference for the calibration parameters of a computer code (an executable). The basic idea is simple: you have a black box (commercial) computer code, which simulates a physical problem, for example a FEM code. Let's call this code Joe. Given an input file, Joe outputs a prediction for the response of a physical system. Now, I also have actual experimental measurements for the response of this system. I would like to find values of Joe's inputs such that the difference between Joe's outputs and the real measurements is minimized (actually things are quite different, but this is just to give an idea). In practice, this means that I need to run Joe many times with different input files, and iteratively find the input values which reduce the "discrepancy" between Joe's prediction and experimental results. In short:
So, while Joe is a commercial code for which I only have the executable (no source), the Bayesian inference is performed in R, because R (and, for what it matters, Python) have excellent tools to perform this kind of study.
This is probably easiest solved using a template language, such as Mustache, which is implemented in R in the whisker package.
Below is an example showing how this can be done in your case. As an example, I only implemented the first three variables and the bar1.out
. Implementing the remaining variables should be straightforward.
library(whisker)
# You could also read the template in using readLines
# template <- readLines("template.txt")
# but to keep example selfsufficient, I included it in the code
template <- "1
bar{{run}}.out
{{P1}}
{{P2}}
{{T1}}
8000.000000000000000
120.000000000000000
60.000000000000000
0.197500000000000
0.197500000000000
2.310000000000000
0.200000000000000
0.000000000000000
1.000000000000000
0.001187700000000
22.000000000000000
1.400000000000000
1.000000000000000
0.010000000000000
100
0.058600000000000
-0.217000000000000
0.078500000000000
-0.110100000000000
30
500.000000000000000
T"
# Store parameters in a list
parameters <- list(
run = 1,
P1 = 65,
P2 = 4,
T1 = 16.85)
for (i in seq_len(10)) {
# New set of parameters
parameters$run <- i
parameters$P1 <- sample(1:100, 1)
# Generate new script by rendering the template using paramers
current_script <- whisker.render(template, parameters)
writeLines(current_script, paste0("foo", i, ".in"))
# Run script
# system(...)
}
What mustache does (in this case; more complex templating is possible; e.g. conditional elements) is replace all {{<variable>}}
with the corresponding value in the parameters
list.
Sounds like you need custom read/write functions; not ideal, but when you have a hybrid column-like-thing, you already diverge from "neat data" (whether or not it is tidy).
Three functions that simplify what I believe you need:
read_myfile <- function(x) {
# mostly during dev
if (file.exists(x)) x <- readLines(x)
if (length(x) == 1) x <- strsplit(rawfile, "\n")[[1]]
# find all left-aligned NAMED rows
hdrs <- grep("[A-Za-z]", x)
hdrs <- c(1, hdrs) # ensure the first "1" is preserved
dat <- mapply(function(a,b,x) if (b >= a) as.numeric(x[seq(a, b)]),
hdrs + 1, c(hdrs[-1] - 1, length(x)), list(x),
SIMPLIFY = FALSE)
names(dat) <- trimws(x[hdrs])
dat
}
mod_myfile <- function(x, i, params) {
# sanity checks
stopifnot(
is.list(x),
is.numeric(i),
is.data.frame(params),
all(c("index", "value") %in% colnames(params))
)
isbarout <- which(names(x) == "bar.out")
stopifnot(
length(isbarout) == 1
)
x$bar.out[ params$index ] <- params$value
names(x)[isbarout] <- sprintf("bar%i.out", i)
x
}
write_myfile <- function(x, ...) {
newdat <- unlist(unname(
mapply(function(hdr, dat) c(hdr, sprintf("%25.15f ", dat)),
names(x), x, SIMPLIFY = TRUE)
))
writeLines(newdat, ...)
}
The use is straight-forward. I'll start with a single character string to emulate the input template (the read function works equally well with a character string as it does with a file name):
rawfile <- "1
bar.out
70.000000000000000
2.000000000000000
14.850000000000000
8000.000000000000000
120.000000000000000
60.000000000000000
0.197500000000000
0.197500000000000
2.310000000000000
0.200000000000000
0.000000000000000
1.000000000000000
0.001187700000000
22.000000000000000
1.400000000000000
1.000000000000000
0.010000000000000
100
0.058600000000000
-0.217000000000000
0.078500000000000
-0.110100000000000
30
500.000000000000000
T
"
To start, just read the data:
dat <- read_myfile(rawfile)
# dat <- read_myfile("file.in")
str(dat)
# List of 3
# $ 1 : NULL
# $ bar.out: num [1:24] 70 2 14.8 8000 120 ...
# $ T : NULL
You will somehow determine how the parameters should be changed. I'll use your previous data:
parameters <- data.frame(
index = c(1:10, 13:16, 21:22),
variable = c("P1", "P2", "T1", "s", "D", "L", "C1", "C2",
"VA", "pw", "m", "mw", "Cp", "Z", "ff_N", "ff_M"),
value = c(65, 4, 16.85, 7900, 110, 60, 0.1975, .1875, 2.31,
0.2, 0.0011877, 22.0, 1.4, 1.0, 0.0785, -0.1101)
)
The first parameter is the output from read_myfile
; the second is the iterator you want to augment bar.out
; the third is this parameters
data.frame:
newdat <- mod_myfile(dat, 32, parameters)
str(newdat)
# List of 3
# $ 1 : NULL
# $ bar32.out: num [1:24] 65 4 16.9 7900 110 ...
# $ T : NULL
And now write it out.
write_myfile(newdat, sprintf("foo%d.in", 32))
I don't know how @GiovanniRighi's performance will compare when run in a single R session, but 1000 of these files takes less than 7 seconds on my computer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With