Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mixing other languages with R

Tags:

unix

r

I use R for most of my statistical analysis. However, cleaning/processing data, especially when dealing with sizes of 1Gb+, is quite cumbersome. So I use common UNIX tools for that. But my question is, is it possible to, say, run them interactively in the middle of an R session? An example: Let's say file1 is the output dataset from an R processes, with 100 rows. From this, for my next R process, I need a specific subset of columns 1 and 2, file2, which can be easily extracted through cut and awk. So the workflow is something like:

Some R process => file1
cut --fields=1,2 <file1 | awk something something >file2
Next R process using file2

Apologies in advance if this is a foolish question.

like image 938
user702432 Avatar asked Nov 30 '22 16:11

user702432


1 Answers

Try this (adding other read.table arguments if needed):

# 1
DF <- read.table(pipe("cut -fields=1,2 < data.txt| awk something_else"))

or in pure R:

# 2
DF <- read.table("data.txt")[1:2]

or to not even read the unwanted fields assuming there are 4 fields:

# 3
DF <- read.table("data.txt", colClasses = c(NA, NA, "NULL", "NULL"))

The last line could be modified for the case where we know we want the first two fields but don't know how many other fields there are:

# 3a
n <- count.fields("data.txt")[1]
read.table("data.txt", header = TRUE, colClasses = c(NA, NA, rep("NULL", n-2)))

The sqldf package can be used. In this example we assume a csv file, data.csv and that the desired fields are called a and b . If its not a csv file then use appropriate arguments to read.csv.sql to specify other separator, etc. :

# 4
library(sqldf)
DF <- read.csv.sql("data.csv", sql = "select a, b from file")
like image 159
G. Grothendieck Avatar answered Dec 21 '22 16:12

G. Grothendieck