I am reading in a csv file into R that looks like this:
3,3
3,2
3,3
3,3
3,3
3,3
2,3
1,2
2,2
3,3
I want to assign a number to each of the 9 unique possibilities that my data can be (3 and 3 is 9, 3 and 2 is 8, 2 and 3 is 6, etc.). I have been trying do design a nested if statement that will evaluate each row, assign a number in a third column, and do this for each row in the data set. I believe this can be done with the apply function, but I am having trouble getting the if statement to work within the apply function. The two columns both have possible values of 1,2, or 3. This is my code thus far, just trying to assign a 9 to to 3/3 columns and 0 to everything else:
#RScript for haplotype analysis
#remove(list=ls())
options(stringsAsFactors=FALSE)
setwd("C:/Documents and Settings/ColumbiaPC/Desktop")
#read in comma-delimited, ID-matched genotype data
OXT <- read.csv("OXTRhaplotype.csv")
colnames(OXT)<- c("OXT1","OXT2")
OXT$HAP <- apply(OXT, 1, function(x) if(x[1]=="3"&&x[2]=="3")x[3]=="9" else 0))
Thanks for any help in advance.
You can use the apply() function to apply a function to each row in a matrix or data frame in R.
You can put a for loop inside an if statement using a technique called a nested control flow. This is the process of putting a control statement inside of another control statement to execute an action.
The nrow() function in R programming R provides us nrow() function to get the rows for an object. That is, with nrow() function, we can easily detect and extract the number of rows present in an object that can be matrix, data frame or even a dataset.
You can solve the problem you describe using a matrix and standard R subsetting, without any if
statements
m <- matrix(1:9, nrow=3, byrow=TRUE)
m
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
This means you can index m using matrix subsetting:
m[3, 2]
[1] 8
m[3,3]
[1] 9
m[2,3]
[1] 6
And now you can apply this to your data:
df <- structure(list(V1 = c(3L, 3L, 3L, 3L, 3L, 3L, 2L, 1L, 2L, 3L),
V2 = c(3L, 2L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 3L)), .Names = c("V1",
"V2"), class = "data.frame", row.names = c(NA, -10L))
#df$m <- sapply(seq_len(nrow(df)), function(i)m[df$V1[i], df$V2[i]])
df$m <- m[as.matrix(df)] # Use matrix subsetting, suggested by @Aaron
df
V1 V2 m
1 3 3 9
2 3 2 8
3 3 3 9
4 3 3 9
5 3 3 9
6 3 3 9
7 2 3 6
8 1 2 2
9 2 2 5
10 3 3 9
Andrie's already answered your question by showing a better approach to your problem. But there are a few mistakes in your original code that I want to mention.
First, &
is not the same as &&
. See ?'&'
for more. I believe you wanted to use &
in your example.
Second, ==
is used for tests of equality, which you use correctly initially in your example. It is not used for assignment, which you incorrectly use it for when assigning "9" to x[3]
. Assignment is handled by <-
, whether inside or outside functions. See ?'=='
and ?'<-'
for more.
Third, assigning a value to x[3]
within the apply()
function does not make sense. apply()
simply returns an array. It does not modify the OXT
object. Below is an example of how your original approach might look. However, Andrie's method is probably better for you.
OXT <- read.table(textConnection(
"3 3
3 2
3 3
3 3
3 3
3 3
2 3
1 2
2 2
3 3"))
colnames(OXT)<- c("OXT1","OXT2")
OXT$HAP <- apply(OXT, 1, function(x)
{
if(x[1] == 3 & x[2] == 3) result <- 9
else if(x[1] == 3 & x[2] == 2) result <- 8
else if(x[1] == 3 & x[2] == 1) result <- 7
else result <- 0
return(result)
})
Unfortunately, I came late and with a solution similar to @Andrie's one, like this:
dat <- matrix(c(3,3,3,2,3,3,3,3,3,3,3,3,2,3,1,2,2,2,3,3),
nr=10, byrow=TRUE)
# here is our lookup table for genotypes
pat <- matrix(1:9, nr=3, byrow=T, dimnames=list(1:3,1:3))
Then
> pat[dat]
[1] 9 8 9 9 9 9 6 2 5 9
gives you what you want.
However, I would like to say that you might find easier to use dedicated package for genetic studies, like the one found on CRAN (like genetics
, gap
or SNPassoc
, to name a few) or Bioconductor, because they include facilities for transforming/recoding genotype data and working with haplotype.
Here is an example of what I have in mind with the above remark:
> library(genetics)
> geno1 <- as.genotype.allele.count(dat[,1]-1)
> geno2 <- as.genotype.allele.count(dat[,2]-1)
> table(geno1, geno2)
geno2
geno1 A/A A/B
A/A 6 1
A/B 1 1
B/B 0 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With