Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performing an if statement on each row in R

I am reading in a csv file into R that looks like this:

3,3
3,2
3,3
3,3
3,3
3,3
2,3
1,2
2,2
3,3

I want to assign a number to each of the 9 unique possibilities that my data can be (3 and 3 is 9, 3 and 2 is 8, 2 and 3 is 6, etc.). I have been trying do design a nested if statement that will evaluate each row, assign a number in a third column, and do this for each row in the data set. I believe this can be done with the apply function, but I am having trouble getting the if statement to work within the apply function. The two columns both have possible values of 1,2, or 3. This is my code thus far, just trying to assign a 9 to to 3/3 columns and 0 to everything else:

#RScript for haplotype analysis

#remove(list=ls())
options(stringsAsFactors=FALSE)
setwd("C:/Documents and Settings/ColumbiaPC/Desktop")

#read in comma-delimited, ID-matched genotype data
OXT <- read.csv("OXTRhaplotype.csv")
colnames(OXT)<- c("OXT1","OXT2")

OXT$HAP <- apply(OXT, 1, function(x) if(x[1]=="3"&&x[2]=="3")x[3]=="9" else 0))

Thanks for any help in advance.

like image 627
Bill Avatar asked May 04 '11 16:05

Bill


People also ask

How do I apply a formula to a row in R?

You can use the apply() function to apply a function to each row in a matrix or data frame in R.

Can you put an if statement inside a for loop in R?

You can put a for loop inside an if statement using a technique called a nested control flow. This is the process of putting a control statement inside of another control statement to execute an action.

Which function for rows R?

The nrow() function in R programming R provides us nrow() function to get the rows for an object. That is, with nrow() function, we can easily detect and extract the number of rows present in an object that can be matrix, data frame or even a dataset.


3 Answers

You can solve the problem you describe using a matrix and standard R subsetting, without any if statements

m <- matrix(1:9, nrow=3, byrow=TRUE)
m

     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9

This means you can index m using matrix subsetting:

m[3, 2]
[1] 8

m[3,3]
[1] 9

m[2,3]
[1] 6

And now you can apply this to your data:

df <- structure(list(V1 = c(3L, 3L, 3L, 3L, 3L, 3L, 2L, 1L, 2L, 3L), 
        V2 = c(3L, 2L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 3L)), .Names = c("V1", 
        "V2"), class = "data.frame", row.names = c(NA, -10L))

#df$m <- sapply(seq_len(nrow(df)), function(i)m[df$V1[i], df$V2[i]])
df$m <- m[as.matrix(df)]  # Use matrix subsetting, suggested by @Aaron
df

   V1 V2 m
1   3  3 9
2   3  2 8
3   3  3 9
4   3  3 9
5   3  3 9
6   3  3 9
7   2  3 6
8   1  2 2
9   2  2 5
10  3  3 9
like image 124
Andrie Avatar answered Oct 10 '22 11:10

Andrie


Andrie's already answered your question by showing a better approach to your problem. But there are a few mistakes in your original code that I want to mention.

First, & is not the same as &&. See ?'&' for more. I believe you wanted to use & in your example.

Second, == is used for tests of equality, which you use correctly initially in your example. It is not used for assignment, which you incorrectly use it for when assigning "9" to x[3]. Assignment is handled by <-, whether inside or outside functions. See ?'==' and ?'<-' for more.

Third, assigning a value to x[3] within the apply() function does not make sense. apply() simply returns an array. It does not modify the OXT object. Below is an example of how your original approach might look. However, Andrie's method is probably better for you.

OXT <- read.table(textConnection(
    "3 3
    3 2
    3 3
    3 3
    3 3
    3 3
    2 3
    1 2
    2 2
    3 3"))
colnames(OXT)<- c("OXT1","OXT2")

OXT$HAP <- apply(OXT, 1, function(x)
    {
        if(x[1] == 3 & x[2] == 3) result <- 9
        else if(x[1] == 3 & x[2] == 2) result <- 8
        else if(x[1] == 3 & x[2] == 1) result <- 7
        else result <- 0
        return(result)
    })
like image 5
jthetzel Avatar answered Oct 10 '22 11:10

jthetzel


Unfortunately, I came late and with a solution similar to @Andrie's one, like this:

dat <- matrix(c(3,3,3,2,3,3,3,3,3,3,3,3,2,3,1,2,2,2,3,3), 
              nr=10, byrow=TRUE) 
# here is our lookup table for genotypes
pat <- matrix(1:9, nr=3, byrow=T, dimnames=list(1:3,1:3))

Then

> pat[dat]
 [1] 9 8 9 9 9 9 6 2 5 9

gives you what you want.

However, I would like to say that you might find easier to use dedicated package for genetic studies, like the one found on CRAN (like genetics, gap or SNPassoc, to name a few) or Bioconductor, because they include facilities for transforming/recoding genotype data and working with haplotype.

Here is an example of what I have in mind with the above remark:

> library(genetics)
> geno1 <- as.genotype.allele.count(dat[,1]-1)
> geno2 <- as.genotype.allele.count(dat[,2]-1)
> table(geno1, geno2)
     geno2
geno1 A/A A/B
  A/A   6   1
  A/B   1   1
  B/B   0   1
like image 5
2 revs Avatar answered Oct 10 '22 09:10

2 revs