I have a dataframe df:
colour shape
'red' circle
'blue' square
'blue' circle
'green' sphere
And a double matrix m with named rows/columns
circle square sphere
red 1 4 7
blue 2 5 8
green 3 6 9
I'd like to add a new column to DF such that I obtain:
id colour shape
1 'red' circle
5 'blue' square
2 'blue' circle
9 'green' sphere
I've tried doing this with the following code but it doesn't seem to work:
df$id <- m[df$colour,df$shape]
I've also tried apply(); and similar but with no luck. Can anyone tell me the right approach to doing this without using a loop?
I think I might win the shortest answer contest here as long as those are character vectors rather than factors which might be more expected unless you made specifid effort to avoid. It really only adds cbind
to convert the two df "character" vectors to a two column matrix expected by the [.matrix
function that you were very close to success in using. (And it also seems reasonably expressive.)
# Data construct
d <- data.frame(color=c('red','blue','blue','green'),
shape=c('circle','square','circle','sphere'), stringsAsFactors=FALSE)
m <- matrix(1:9, 3,3, dimnames=list(c('red','blue','green'), c('circle','square','sphere')))
# Code:
d$id <- with( d, m [ cbind(color, shape) ] )
d
color shape id
1 red circle 1
2 blue square 5
3 blue circle 2
4 green sphere 9
A rather simple (and fast!) alternative is to use a matrix to index into your matrix:
# Your data
d <- data.frame(color=c('red','blue','blue','green'), shape=c('circle','square','circle','sphere'))
m <- matrix(1:9, 3,3, dimnames=list(c('red','blue','green'), c('circle','square','sphere')))
# Create index matrix - each row is a row/col index
i <- cbind(match(d$color, rownames(m)), match(d$shape, colnames(m)))
# Now use it and add as the id column...
d2 <- cbind(id=m[i], d)
d2
# id color shape
#1 1 red circle
#2 5 blue square
#3 2 blue circle
#4 9 green sphere
The match
function is used to find the corresponding numeric index for a particular string.
Note that in newer version of R (2.13 and newer I think), you can use character strings in the index matrix. Unfortunately, the color and shape columns are typically factors
, and cbind
doesn't like that (it uses the integer codes), so you need to coerce them with as.character
:
i <- cbind(as.character(d$color), as.character(d$shape))
...I suspect that using match
is more efficient though.
EDIT I measured and it seems to be about 20% faster to use match
:
# Make 1 million rows
d <- d[sample.int(nrow(d), 1e6, TRUE), ]
system.time({
i <- cbind(match(d$color, rownames(m)), match(d$shape, colnames(m)))
d2 <- cbind(id=m[i], d)
}) # 0.46 secs
system.time({
i <- cbind(as.character(d$color), as.character(d$shape))
d2 <- cbind(id=m[i], d)
}) # 0.55 secs
Another answer Using the reshape2 and plyr (optional just for join) packages.
require(plyr)
require(reshape2)
Df <- data.frame(colour = c("red", "blue", "blue", "green"),
shape = c("circle", "square", "circle", "sphere"))
Mat <- matrix(1:9, dimnames = list(c("red", "blue", "green"),
c("circle", "square", "sphere")),
nrow = 3)
Df2 <- melt.array(Mat, varnames = c("colour", "shape"))
join(Df, Df2)
result <- join(Df, Df2)
join(Df, Df2)
Joining by: colour, shape
colour shape value
1 red circle 1
2 blue square 5
3 blue circle 2
4 green sphere 9
Hope this help
merge()
is your friend here. To use it, we need an appropriate data frame to merge with containing the stacked version of your ID matrix. I create that as newdf
with the code below:
df <- data.frame(matrix(1:9, ncol = 3))
colnames(df) <- c("circle","square","sphere")
rownames(df) <- c("red","blue","green")
newdf <- cbind.data.frame(ID = unlist(df),
expand.grid(colour = rownames(df),
shape = colnames(df)))
Which results in:
> newdf
ID colour shape
circle1 1 red circle
circle2 2 blue circle
circle3 3 green circle
square1 4 red square
square2 5 blue square
square3 6 green square
sphere1 7 red sphere
sphere2 8 blue sphere
sphere3 9 green sphere
Then with your original data in object df2
, defined using
df2 <- data.frame(colour = c("red","blue","blue","green"),
shape = c("circle","square","circle","sphere"))
use merge()
> merge(newdf, df2, sort = FALSE)
colour shape ID
1 red circle 1
2 blue circle 2
3 blue square 5
4 green sphere 9
You can store that and rearrange the columns if you need that:
> res <- merge(newdf, df2, sort = FALSE)
> res <- res[,c(3,1,2)]
> res
ID colour shape
1 1 red circle
2 2 blue circle
3 5 blue square
4 9 green sphere
You could also convert the matrix m to a vector and then match the ID to the colour and shape values:
df<-data.frame(colour=c("red","blue","blue","green"),
shape=c("circle","square","circle","sphere"))
m<-matrix(1:9,nrow=3,dimnames=list(c("red","blue","green"),
c("circle","square","sphere")))
mVec<-as.vector(m)
The next step matches the colour in df to the appropriate dimname in the m matrix, then adds an integer corresponding to the shape. The result in the index of the m vector with the corresponding ID.
df$ID<-mVec[match(df$colour, dimnames(m)[[1]]) + (dim(m)[1]*
(match(df$shape, dimnames(m)[[2]]) - 1))]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With