Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dictionary style replace multiple items

I have a large data.frame of character data that I want to convert based on what is commonly called a dictionary in other languages.

Currently I am going about it like so:

foo <- data.frame(snp1 = c("AA", "AG", "AA", "AA"), snp2 = c("AA", "AT", "AG", "AA"), snp3 = c(NA, "GG", "GG", "GC"), stringsAsFactors=FALSE) foo <- replace(foo, foo == "AA", "0101") foo <- replace(foo, foo == "AC", "0102") foo <- replace(foo, foo == "AG", "0103") 

This works fine, but it is obviously not pretty and seems silly to repeat the replace statement each time I want to replace one item in the data.frame.

Is there a better way to do this since I have a dictionary of approximately 25 key/value pairs?

like image 476
Stedy Avatar asked Sep 25 '11 18:09

Stedy


2 Answers

If you're open to using packages, plyr is a very popular one and has this handy mapvalues() function that will do just what you're looking for:

foo <- mapvalues(foo, from=c("AA", "AC", "AG"), to=c("0101", "0102", "0103")) 

Note that it works for data types of all kinds, not just strings.

like image 129
c.gutierrez Avatar answered Oct 02 '22 20:10

c.gutierrez


map = setNames(c("0101", "0102", "0103"), c("AA", "AC", "AG")) foo[] <- map[unlist(foo)] 

assuming that map covers all the cases in foo. This would feel less like a 'hack' and be more efficient in both space and time if foo were a matrix (of character()), then

matrix(map[foo], nrow=nrow(foo), dimnames=dimnames(foo)) 

Both matrix and data frame variants run afoul of R's 2^31-1 limit on vector size when there are millions of SNPs and thousands of samples.

like image 32
Martin Morgan Avatar answered Oct 02 '22 20:10

Martin Morgan