Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Passing arguments to R script in command line (shell/bash): what to do when column names contain tilde (~)

I'm utilizing Rscript to run an R script through bash, and I want to specify arguments to be passed to functions within the script itself. Specifically, I want to pass arguments that specify:

  • path to data file (.csv) and
  • certain column names in that data file.

I run into a problem when the column names include the tilde sign (~). I've tried wrapping the column names with backticks but still unsuccessful.

Example

I want to write a script that takes in a data file in .csv format and plots a histogram for one variable according to the user's choice.

Here's my function:

plot_histogram <- function(path_to_input, x_var) {
  
  data_raw <- read.csv(file = path_to_input)
  
  path_to_output_folder <- dirname(path_to_input)
  
  png(filename = paste0(path_to_output_folder, "/", "output_plot.png"))
  
  hist(as.numeric(na.omit(data_raw[[x_var]])), main = "histogram", xlab = "my_var")
  
  replicate(dev.off(), n = 20)
}

Let's run it on some fake data

set.seed(123)
df <- data.frame(age = sample(20:80, size = 100, replace = TRUE))

write.csv(df, "some_age_data.csv")

plot_histogram(path_to_input = "some_age_data.csv",
               x_var = "age")

As intended, I get a .png file with the plot, saved to the same directory where the .csv is at hist

Now customize an R script to be run from command line

plot_histogram.R

args <- commandArgs(trailingOnly = TRUE)

## same function as above
plot_histogram <- function(path_to_input, x_var) {
  
  data_raw <- read.csv(file = path_to_input)
  path_to_output_folder <- dirname(path_to_input)
  png(filename = paste0(path_to_output_folder, "/", "output_plot.png"))
  hist(as.numeric(na.omit(data_raw[[x_var]])), main = "histogram", xlab = "my_var")
  replicate(dev.off(), n = 20)
}

plot_histogram(path_to_input = args[1], x_var = args[2])

Then run via command line using Rscript

$ Rscript --vanilla plot_histogram.R /../../../some_age_data.csv "age"

Works too!

However, things break if the column name contains tilde

Step 1: create fake data

library(tibble)

set.seed(123)
df <- tibble(`age-blah~value` = sample(20:80, size = 100, replace = T))

write.csv(df, "some_age_data.csv")

Step 2: Using Rscript:

$ Rscript --vanilla plot_histogram.R /../../../some_age_data.csv "age-blah~value"

Error in hist.default(as.numeric(na.omit(data_raw[[x_var]])), main = "histogram", : invalid number of 'breaks' Calls: plot_histogram -> hist -> hist.default Execution halted

Bottom Line

When using Rscript, how can I pass an argument that specifies a column name containing tilde? Alternatively, how can I work around .csv files that have such a format of tilde in column names, within the framework of Rscript?

Thanks!

like image 746
Emman Avatar asked Oct 29 '20 20:10

Emman


People also ask

How do I enter command line arguments in bash?

Shift Operator. Shift operator in bash (syntactically shift n, where n is the number of positions to move) shifts the position of the command line arguments. The default value for n is one if not specified. The shift operator causes the indexing of the input to start from the shifted position.


1 Answers

You are successfully passing an argument that specifies a column name containing tilde. However, read.csv has "fixed" the column names so it doesn't actually contain a tilde.

read.csv is silently converting the column name to age.blah.value. Use check.names = FALSE to make it age-blah~value.

data_raw <- read.csv(file = path_to_input, check.names = FALSE)
like image 184
Paul Avatar answered Oct 24 '22 15:10

Paul