Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add column to table with data from another table

Tags:

dataframe

r

I have a table that looks as follows:

Table1 <- data.frame(
    "Random" = c("A", "B", "C"), 
    "Genes" = c("Apple", "Candy", "Toothpaste"), 
    "Extra" = c("Up", "", "Down"), 
    "Desc" = c("Healthy,Red,Fruit", "Sweet,Cavities,Sugar,Fruity", "Minty,Dentist")
)

giving:

  Random      Genes Extra                       Desc
1      A      Apple    Up          Healthy,Red,Fruit
2      B      Candy       Sweet,Cavities,Sugar,Fruity
3      C Toothpaste  Down              Minty,Dentist

I have another table with Descriptions and want to add a columns with Genes. For instance Table2 would be:

Table2 <- data.frame(
    "Col1" = c(1, 2, 3, 4, 5, 6), 
    "Desc" = c("Sweet", "Sugar", "Dentist", "Red", "Fruit", "Fruity")
)

Giving:

  Col1    Desc
1    1   Sweet
2    2   Sugar
3    3 Dentist
4    4     Red
5    5   Fruit
6    6  Fruity

I want to add another column to Table2 named "Genes" that matches the "Desc" from both tables and adds the Genes from Table1 to get:

  Col1    Desc    Gene
1    1   Sweet    Candy
2    2   Sugar    Candy
3    3 Dentist    Toothpaste
4    4     Red    Apple
5    5   Fruit    Apple
6    6  Fruity    Candy
like image 624
user2165857 Avatar asked May 06 '15 17:05

user2165857


2 Answers

You could try cSplit from splitstackshape to split the 'Desc' column in "Table1" and convert the dataset from 'wide' to 'long' format. The output will be a data.table. We can use the data.table methods for setting the key column as 'Desc' (setkey), join with "Table2", and finally remove the columns that are not needed in the output either by selecting the columns or assigning (:=) the unwanted columns to NULL

library(splitstackshape)
setkey(cSplit(Table1, 'Desc', ',', 'long'),Desc)[Table2[2:1]][
                   ,c(5,4,2), with=FALSE]
#  Col1    Desc      Genes
#1:    1   Sweet      Candy
#2:    2   Sugar      Candy
#3:    3 Dentist Toothpaste
#4:    4     Red      Apple
#5:    5   Fruit      Apple
#6:    6  Fruity      Candy
like image 175
akrun Avatar answered Oct 29 '22 01:10

akrun


Here is a method in base R that uses an intermediate linking table:

# create an intermediate data.frame with all the key (Desc) / value (Gene) pairs
df  <-  NULL
for(i in seq(nrow(Table1)))
    df  <-  rbind(df,
                  data.frame(Gene =Table1$Genes[i],
                            Desc =strsplit(as.character(Table1$Desc)[i],',')[[1]]))
df 
#>         Gene     Desc
#> 1      Apple  Healthy
#> 2      Apple      Red
#> 3      Apple    Fruit
#> 4      Candy    Sweet
#> 5      Candy Cavities
#> 6      Candy    Sugar
#> 7      Candy   Fruity
#> 8 Toothpaste    Minty
#> 9 Toothpaste  Dentist

Now link to it in the usual way:

Table2$Gene  <-  df$Gene[match(Table2$Desc,df$Desc)]
like image 5
Jthorpe Avatar answered Oct 29 '22 01:10

Jthorpe