Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to change values from character to number using if and for statement?

Tags:

r

I am handling a microarray data.

I have two tables, one is pathway and gene set table (I will call it as A table) and the other is microarray table (Lets say it B)

I need to change gene symbols(characters) to expression value(numbers) in A table according to each expression value of gene symbols in B

Tables look like followings

A table                                            B table
Pathway   v1    v2   ...v249 v250                 Gene      Value         
   1       A    E        NA   NA                   E        1000
   2       B    A        Z    I                    A         500
   3       C    G        X    NA                   G         200
   4       D    K        P    NA                   B         300
                                                   P          10
                                                   Z          20

I want to change A table like following way

   A table                            
Pathway   v1       v2   ...    v249 v250      
   1       500    1000         NA    NA 
   2       300    500          20    NA
   3       NA     200          NA    NA
   4       NA     NA           10    NA 

If there are no matched gene symbols, they should be replaced with 'NA'

like image 863
Sejin Avatar asked Feb 08 '23 18:02

Sejin


2 Answers

We can also do this using base R. We convert the subset of 'A' (i.e. except the 'Pathway' column) to matrix, match with 'Gene' from 'B', the numeric index obtained can be used to populate the corresponding 'Value' column, and assign the output back.

A1 <- A
A1[-1] <- B$Value[match(as.matrix(A[-1]), B$Gene)]
A1
#  Pathway  v1   v2
#1       1 500 1000
#2       2 300  500
#3       3  NA  200
#4       4  NA   NA

NOTE: Datasets from @DavidArenburg's post.

like image 91
akrun Avatar answered Feb 11 '23 07:02

akrun


I would suggest, first melting, then merging, the dcasting back. This will work for any number of columns in the A data set. I will be using the latest data.table version on CRAN for this (v 1.9.6+)

library(data.table) # V 1.9.6+
res <- melt(setDT(A), id = "Pathway")[setDT(B), Value := i.Value, on = c(value = "Gene")]
dcast(res, Pathway ~ variable, value.var = "Value")
#    Pathway  v1   v2
# 1:       1 500 1000
# 2:       2 300  500
# 3:       3  NA  200
# 4:       4  NA   NA

Or similarly using Hadleyverse

library(dplyr)
library(tidyr)
A %>%
  gather(res, Gene, -Pathway) %>%
  left_join(., B, by = "Gene") %>%
  select(-Gene) %>%
  spread(res, Value)
#   Pathway  v1   v2
# 1       1 500 1000
# 2       2 300  500
# 3       3  NA  200
# 4       4  NA   NA  

Data

A <- structure(list(Pathway = 1:4, v1 = structure(1:4, .Label = c("A", 
"B", "C", "D"), class = "factor"), v2 = structure(c(2L, 1L, 3L, 
4L), .Label = c("A", "E", "G", "K"), class = "factor")), .Names = c("Pathway", 
"v1", "v2"), class = "data.frame", row.names = c(NA, -4L))

B <- structure(list(Gene = structure(c(3L, 1L, 4L, 2L), .Label = c("A", 
"B", "E", "G"), class = "factor"), Value = c(1000L, 500L, 200L, 
300L)), .Names = c("Gene", "Value"), class = "data.frame", row.names = c(NA, 
-4L))
like image 33
David Arenburg Avatar answered Feb 11 '23 08:02

David Arenburg