I am handling a microarray data.
I have two tables, one is pathway and gene set table (I will call it as A table) and the other is microarray table (Lets say it B)
I need to change gene symbols(characters) to expression value(numbers) in A table according to each expression value of gene symbols in B
Tables look like followings
A table B table
Pathway v1 v2 ...v249 v250 Gene Value
1 A E NA NA E 1000
2 B A Z I A 500
3 C G X NA G 200
4 D K P NA B 300
P 10
Z 20
I want to change A table like following way
A table
Pathway v1 v2 ... v249 v250
1 500 1000 NA NA
2 300 500 20 NA
3 NA 200 NA NA
4 NA NA 10 NA
If there are no matched gene symbols, they should be replaced with 'NA'
We can also do this using base R. We convert the subset of 'A' (i.e. except the 'Pathway' column) to matrix, match with 'Gene' from 'B', the numeric index obtained can be used to populate the corresponding 'Value' column, and assign the output back.
A1 <- A
A1[-1] <- B$Value[match(as.matrix(A[-1]), B$Gene)]
A1
# Pathway v1 v2
#1 1 500 1000
#2 2 300 500
#3 3 NA 200
#4 4 NA NA
NOTE: Datasets from @DavidArenburg's post.
I would suggest, first melting, then merging, the dcasting back. This will work for any number of columns in the A data set. I will be using the latest data.table version on CRAN for this (v 1.9.6+)
library(data.table) # V 1.9.6+
res <- melt(setDT(A), id = "Pathway")[setDT(B), Value := i.Value, on = c(value = "Gene")]
dcast(res, Pathway ~ variable, value.var = "Value")
# Pathway v1 v2
# 1: 1 500 1000
# 2: 2 300 500
# 3: 3 NA 200
# 4: 4 NA NA
Or similarly using Hadleyverse
library(dplyr)
library(tidyr)
A %>%
gather(res, Gene, -Pathway) %>%
left_join(., B, by = "Gene") %>%
select(-Gene) %>%
spread(res, Value)
# Pathway v1 v2
# 1 1 500 1000
# 2 2 300 500
# 3 3 NA 200
# 4 4 NA NA
Data
A <- structure(list(Pathway = 1:4, v1 = structure(1:4, .Label = c("A",
"B", "C", "D"), class = "factor"), v2 = structure(c(2L, 1L, 3L,
4L), .Label = c("A", "E", "G", "K"), class = "factor")), .Names = c("Pathway",
"v1", "v2"), class = "data.frame", row.names = c(NA, -4L))
B <- structure(list(Gene = structure(c(3L, 1L, 4L, 2L), .Label = c("A",
"B", "E", "G"), class = "factor"), Value = c(1000L, 500L, 200L,
300L)), .Names = c("Gene", "Value"), class = "data.frame", row.names = c(NA,
-4L))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With