Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

reading a template file and writing it to disk after some modifications

Tags:

file-io

r

I need to read a template file test.txt, modify the contents and then write to disk a modified copy with name foo`i`.in (i is an iteration number). Since I need to perform this operation a large number of times (a million times wouldn't be uncommon), efficient solutions would be preferred. The template file is like this:

1 
bar.out 
       70.000000000000000 
        2.000000000000000 
       14.850000000000000 
     8000.000000000000000 
      120.000000000000000 
       60.000000000000000 
        0.197500000000000 
        0.197500000000000 
        2.310000000000000 
        0.200000000000000 
        0.000000000000000 
        1.000000000000000 
        0.001187700000000 
       22.000000000000000 
        1.400000000000000 
        1.000000000000000 
        0.010000000000000 
100 
        0.058600000000000 
       -0.217000000000000 
        0.078500000000000 
       -0.110100000000000 
30 
      500.000000000000000 
T 

I don't need to modify all lines, just some of them. Specifically, I need to modify bar.out to bar`i`.out where i is an iteration index. I also need to modify some numeric lines with the following values:

parameters <- data.frame(index = c(1:10, 13:16, 21:22), variable = c("P1", 
                      "P2", "T1", "s", "D", "L", "C1", "C2", "VA", 
                      "pw", "m", "mw", "Cp", "Z", "ff_N", "ff_M"),
                      value = c(65, 4, 16.85, 7900, 110, 60, 0.1975, .1875, 2.31,
                                 0.2, 0.0011877, 22.0, 1.4, 1.0, 0.0785, -0.1101))

All the other lines must remain the same, including the last line T. Thus, assuming I'm at the first iteration, the expected output is a text file named foo1.in having the content (the exact number format is not important, as long as all the significant digits in parameters$value are included in foo1.in):

1 
bar1.out 
       65.000000000000000 
        4.000000000000000
       16.850000000000000 
     7900.000000000000000 
      110.000000000000000 
       60.000000000000000 
        0.197500000000000 
        0.187500000000000 
        2.310000000000000 
        0.200000000000000 
        0.000000000000000 
        1.000000000000000 
        0.001187700000000 
       22.000000000000000 
        1.400000000000000 
        1.000000000000000 
        0.010000000000000 
100 
        0.058600000000000 
       -0.217000000000000 
        0.078500000000000 
       -0.110100000000000 
30 
      500.000000000000000 
T 

Modifying foo.in and bar.out is easy:

template  <- "test.txt"
infile    <- "foo.in"
string1 <- "bar.out"
iteration <- 1

# build string1
elements <- strsplit(string1, "\\.")[[1]]
elements[1] <- paste0(elements[1], iteration)
string1 <- paste(elements, collapse = ".")

# build infile name
elements <- strsplit(infile, "\\.")[[1]]
elements[1] <- paste0(elements[1], iteration)
infile<- paste(elements, collapse = ".")

Now, I would like to read the template file and modify only the intended lines. The first problem I face is that read.table only outputs a data frame. Since my template file contains numbers and strings in the same column, if I read all the file with read.table I would obtain a character column (I guess). I circumvent the problem by reading only the numeric values I'm interested in:

    # read template file   
    temp <- read.table(template, stringsAsFactors = FALSE, skip = 2, nrows = 23)$V1
    lines_to_read <- temp[length(temp)]

    # modify numerical parameter values
    temp[parameters$index] <- parameters$value

However, now I don't know how to write foo1.in. If I use write.table, I can only write matrices or dataframes to disk, so I can't write a file which contains numbers and strings in the same column. How can I solve this?

EDIT I provide a bit of background on this problem, to explain why I need to write this file so many times. So, the idea is to perform Bayesian inference for the calibration parameters of a computer code (an executable). The basic idea is simple: you have a black box (commercial) computer code, which simulates a physical problem, for example a FEM code. Let's call this code Joe. Given an input file, Joe outputs a prediction for the response of a physical system. Now, I also have actual experimental measurements for the response of this system. I would like to find values of Joe's inputs such that the difference between Joe's outputs and the real measurements is minimized (actually things are quite different, but this is just to give an idea). In practice, this means that I need to run Joe many times with different input files, and iteratively find the input values which reduce the "discrepancy" between Joe's prediction and experimental results. In short:

  1. I need to generate many input (text) files
  2. I don't know in advance the contents of the input files. The numerical parameters are modified during the optimization in an iterative way.
  3. I also need to read Joe's output for each input. This is actually another problem and I'll probably write a specific question on this point.

So, while Joe is a commercial code for which I only have the executable (no source), the Bayesian inference is performed in R, because R (and, for what it matters, Python) have excellent tools to perform this kind of study.

like image 755
DeltaIV Avatar asked Feb 21 '17 17:02

DeltaIV


2 Answers

This is probably easiest solved using a template language, such as Mustache, which is implemented in R in the whisker package.

Below is an example showing how this can be done in your case. As an example, I only implemented the first three variables and the bar1.out. Implementing the remaining variables should be straightforward.

library(whisker)


# You could also read the template in using readLines
# template <- readLines("template.txt")
# but to keep example selfsufficient, I included it in the code
template <- "1 
bar{{run}}.out 
      {{P1}}
      {{P2}}
      {{T1}}
     8000.000000000000000 
      120.000000000000000 
       60.000000000000000 
        0.197500000000000 
        0.197500000000000 
        2.310000000000000 
        0.200000000000000 
        0.000000000000000 
        1.000000000000000 
        0.001187700000000 
       22.000000000000000 
        1.400000000000000 
        1.000000000000000 
        0.010000000000000 
100 
        0.058600000000000 
       -0.217000000000000 
        0.078500000000000 
       -0.110100000000000 
30 
      500.000000000000000 
T"


# Store parameters in a list
parameters <- list(
  run = 1, 
  P1 = 65,
  P2 = 4,
  T1 = 16.85)

for (i in seq_len(10)) {
  # New set of parameters
  parameters$run <- i
  parameters$P1  <- sample(1:100, 1)

  # Generate new script by rendering the template using paramers
  current_script <- whisker.render(template, parameters)
  writeLines(current_script, paste0("foo", i, ".in"))

  # Run script
  # system(...)
}

What mustache does (in this case; more complex templating is possible; e.g. conditional elements) is replace all {{<variable>}} with the corresponding value in the parameters list.

like image 131
Jan van der Laan Avatar answered Nov 01 '22 09:11

Jan van der Laan


Sounds like you need custom read/write functions; not ideal, but when you have a hybrid column-like-thing, you already diverge from "neat data" (whether or not it is tidy).

Three functions that simplify what I believe you need:

read_myfile <- function(x) {
  # mostly during dev
  if (file.exists(x)) x <- readLines(x)
  if (length(x) == 1) x <- strsplit(rawfile, "\n")[[1]]
  # find all left-aligned NAMED rows
  hdrs <- grep("[A-Za-z]", x)
  hdrs <- c(1, hdrs) # ensure the first "1" is preserved
  dat <- mapply(function(a,b,x) if (b >= a) as.numeric(x[seq(a, b)]),
                hdrs + 1, c(hdrs[-1] - 1, length(x)), list(x),
                SIMPLIFY = FALSE)
  names(dat) <- trimws(x[hdrs])
  dat
}

mod_myfile <- function(x, i, params) {
  # sanity checks
  stopifnot(
    is.list(x),
    is.numeric(i),
    is.data.frame(params),
    all(c("index", "value") %in% colnames(params))
  )
  isbarout <- which(names(x) == "bar.out")
  stopifnot(
    length(isbarout) == 1
  )
  x$bar.out[ params$index ] <- params$value
  names(x)[isbarout] <- sprintf("bar%i.out", i)
  x
}

write_myfile <- function(x, ...) {
  newdat <- unlist(unname(
    mapply(function(hdr, dat) c(hdr, sprintf("%25.15f ", dat)),
           names(x), x, SIMPLIFY = TRUE)
  ))
  writeLines(newdat, ...)
}

The use is straight-forward. I'll start with a single character string to emulate the input template (the read function works equally well with a character string as it does with a file name):

rawfile <- "1 
bar.out 
       70.000000000000000 
        2.000000000000000 
       14.850000000000000 
     8000.000000000000000 
      120.000000000000000 
       60.000000000000000 
        0.197500000000000 
        0.197500000000000 
        2.310000000000000 
        0.200000000000000 
        0.000000000000000 
        1.000000000000000 
        0.001187700000000 
       22.000000000000000 
        1.400000000000000 
        1.000000000000000 
        0.010000000000000 
100 
        0.058600000000000 
       -0.217000000000000 
        0.078500000000000 
       -0.110100000000000 
30 
      500.000000000000000 
T 
"

To start, just read the data:

dat <- read_myfile(rawfile)
# dat <- read_myfile("file.in")
str(dat)
# List of 3
#  $ 1      : NULL
#  $ bar.out: num [1:24] 70 2 14.8 8000 120 ...
#  $ T      : NULL

You will somehow determine how the parameters should be changed. I'll use your previous data:

parameters <- data.frame(
  index = c(1:10, 13:16, 21:22),
  variable = c("P1", "P2", "T1", "s", "D", "L", "C1", "C2",
               "VA", "pw", "m", "mw", "Cp", "Z", "ff_N", "ff_M"),
  value = c(65, 4, 16.85, 7900, 110, 60, 0.1975, .1875, 2.31,
            0.2, 0.0011877, 22.0, 1.4, 1.0, 0.0785, -0.1101)
)

The first parameter is the output from read_myfile; the second is the iterator you want to augment bar.out; the third is this parameters data.frame:

newdat <- mod_myfile(dat, 32, parameters)
str(newdat)
# List of 3
#  $ 1        : NULL
#  $ bar32.out: num [1:24] 65 4 16.9 7900 110 ...
#  $ T        : NULL

And now write it out.

write_myfile(newdat, sprintf("foo%d.in", 32))

I don't know how @GiovanniRighi's performance will compare when run in a single R session, but 1000 of these files takes less than 7 seconds on my computer.

like image 3
r2evans Avatar answered Nov 01 '22 10:11

r2evans