Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combining R + awk + bash commands

Tags:

r

I want to combine awk and R language. The thing is that I have a set of *.txt files in a specified directory and that I don't know the length of the header from the files. In some cases I have to skip 25 lines while in others I have to skip 27 and etc. So I want to type some awk commands to get the number of lines to skip. Once I have this value, I can begin processing the data with R.

Furthermore, in the R file I combine R an bash so my code looks like this :

!/usr/bin/env Rscript
...
argv <- commandArgs(T)
**error checking...**
import_file <- argv[1]
export_file <- argv[2]
**# your function call**
format_windpro(import_file, export_file)

Where and how can i type my awk command. Thanks!

I tried to do what you told me about awk commands and I still get an error. The program doesn't recognize my command and so I can not enter the number of lines to skip to my function. Here is my code:

**nline <- paste('$(grep -n 'm/s' import_file |awk -F":" '{print $1}')')

nline <- scan(pipe(nline),quiet=T)**

I look for the pattern m/s in the first column in order to know where I have my header text. I use R under w7.

like image 369
JPV Avatar asked Mar 02 '12 10:03

JPV


2 Answers

Besides Vincent's hint of using system("awk ...", intern=TRUE), you can also use the pipe() function that is part of the usual text connections:

R> sizes <- read.table(pipe("ls -l /tmp | awk '!/^total/ {print $5}'"))
R> summary(sizes)
       V1          
 Min.   :       0  
 1st Qu.:     482  
 Median :    4096  
 Mean   :   98746  
 3rd Qu.:   13952  
 Max.   :27662342  
R> 

Here I am piping a command into awk and then read all the output from awk, that could also be a single line:

R> cmd <- "ls -l /tmp | awk '!/^total/ {sum = sum + $5} END {print sum}'"
R> totalsize <- scan(pipe(cmd), quiet=TRUE)
R> totalsize
[1] 116027050
R> 
like image 194
Dirk Eddelbuettel Avatar answered Nov 15 '22 21:11

Dirk Eddelbuettel


You can use system to run an external program from R.

system("gawk --version", intern=TRUE)
like image 21
Vincent Zoonekynd Avatar answered Nov 15 '22 20:11

Vincent Zoonekynd