Performing an if statement on each row in R

Tags:

I am reading in a csv file into R that looks like this:

3,3
3,2
3,3
3,3
3,3
3,3
2,3
1,2
2,2
3,3

I want to assign a number to each of the 9 unique possibilities that my data can be (3 and 3 is 9, 3 and 2 is 8, 2 and 3 is 6, etc.). I have been trying do design a nested if statement that will evaluate each row, assign a number in a third column, and do this for each row in the data set. I believe this can be done with the apply function, but I am having trouble getting the if statement to work within the apply function. The two columns both have possible values of 1,2, or 3. This is my code thus far, just trying to assign a 9 to to 3/3 columns and 0 to everything else:

#RScript for haplotype analysis

#remove(list=ls())
options(stringsAsFactors=FALSE)
setwd("C:/Documents and Settings/ColumbiaPC/Desktop")

#read in comma-delimited, ID-matched genotype data
OXT <- read.csv("OXTRhaplotype.csv")
colnames(OXT)<- c("OXT1","OXT2")

OXT$HAP <- apply(OXT, 1, function(x) if(x[1]=="3"&&x[2]=="3")x[3]=="9" else 0))

Thanks for any help in advance.

627

asked May 04 '11 16:05

Bill

3 Answers

You can solve the problem you describe using a matrix and standard R subsetting, without any if statements

m <- matrix(1:9, nrow=3, byrow=TRUE)
m

     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9

This means you can index m using matrix subsetting:

m[3, 2]
[1] 8

m[3,3]
[1] 9

m[2,3]
[1] 6

And now you can apply this to your data:

df <- structure(list(V1 = c(3L, 3L, 3L, 3L, 3L, 3L, 2L, 1L, 2L, 3L), 
        V2 = c(3L, 2L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 3L)), .Names = c("V1", 
        "V2"), class = "data.frame", row.names = c(NA, -10L))

#df$m <- sapply(seq_len(nrow(df)), function(i)m[df$V1[i], df$V2[i]])
df$m <- m[as.matrix(df)]  # Use matrix subsetting, suggested by @Aaron
df

   V1 V2 m
1   3  3 9
2   3  2 8
3   3  3 9
4   3  3 9
5   3  3 9
6   3  3 9
7   2  3 6
8   1  2 2
9   2  2 5
10  3  3 9

124

answered Oct 10 '22 11:10

Andrie

Andrie's already answered your question by showing a better approach to your problem. But there are a few mistakes in your original code that I want to mention.

First, & is not the same as &&. See ?'&' for more. I believe you wanted to use & in your example.

Second, == is used for tests of equality, which you use correctly initially in your example. It is not used for assignment, which you incorrectly use it for when assigning "9" to x[3]. Assignment is handled by <-, whether inside or outside functions. See ?'==' and ?'<-' for more.

Third, assigning a value to x[3] within the apply() function does not make sense. apply() simply returns an array. It does not modify the OXT object. Below is an example of how your original approach might look. However, Andrie's method is probably better for you.

OXT <- read.table(textConnection(
    "3 3
    3 2
    3 3
    3 3
    3 3
    3 3
    2 3
    1 2
    2 2
    3 3"))
colnames(OXT)<- c("OXT1","OXT2")

OXT$HAP <- apply(OXT, 1, function(x)
    {
        if(x[1] == 3 & x[2] == 3) result <- 9
        else if(x[1] == 3 & x[2] == 2) result <- 8
        else if(x[1] == 3 & x[2] == 1) result <- 7
        else result <- 0
        return(result)
    })

answered Oct 10 '22 11:10

jthetzel

Unfortunately, I came late and with a solution similar to @Andrie's one, like this:

dat <- matrix(c(3,3,3,2,3,3,3,3,3,3,3,3,2,3,1,2,2,2,3,3), 
              nr=10, byrow=TRUE) 
# here is our lookup table for genotypes
pat <- matrix(1:9, nr=3, byrow=T, dimnames=list(1:3,1:3))

Then

> pat[dat]
 [1] 9 8 9 9 9 9 6 2 5 9

gives you what you want.

However, I would like to say that you might find easier to use dedicated package for genetic studies, like the one found on CRAN (like genetics, gap or SNPassoc, to name a few) or Bioconductor, because they include facilities for transforming/recoding genotype data and working with haplotype.

Here is an example of what I have in mind with the above remark:

> library(genetics)
> geno1 <- as.genotype.allele.count(dat[,1]-1)
> geno2 <- as.genotype.allele.count(dat[,2]-1)
> table(geno1, geno2)
     geno2
geno1 A/A A/B
  A/A   6   1
  A/B   1   1
  B/B   0   1

answered Oct 10 '22 09:10

2 revs

Related questions
                            
                                Function definition in Haskell
                            
                                Should I call a member function in a constructor
                            
                                JavaScript - for Loop vs. Array shift
                            
                                Why do we specify arrays size as a parameter when passing to function in C++?
                            
                                Getting a function to return two integers
                            
                                difference between (a -> a) and a -> a
                            
                                Zipping lists together in Common Lisp - Problem with "and"
                            
                                Get a list of names which start with certain letters [closed]
                            
                                Function declaration order matters in c language or am I doing something wrong?
                            
                                warning: address of local variable 'angles' returned [-Wreturn-local-addr]
                            
                                In what way are Clojure maps functions?
                            
                                In PHP: How to call a $variable inside one function that was defined previously inside another function?
                            
                                What is the difference between FUNCALL and #'function-name in common lisp?
                            
                                Flood Fill in Python
                            
                                C Warning : Function was used with no prototype before its definition [closed]
                            
                                Variadic functions and arguments assignment in C/C++
                            
                                What's wrong with this Regular Expression?
                            
                                All tkinter functions run when program starts
                            
                                how variables are stored and treated in recursion function in python?
                            
                                How can I return a lambda with guards and double recursion?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Performing an if statement on each row in R

Tags:

syntax

function

r

apply