Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge dataframes, different lengths

Tags:

merge

r

I want to add variables from dat2:

          concreteness familiarity typicality
amoeba            3.60        1.30       1.71
bacterium         3.82        3.48       2.13
leech             5.71        1.83       4.50

To dat1:

    ID  variable value
1    1    amoeba     0
2    2    amoeba     0
3    3    amoeba    NA
251  1 bacterium     0
252  2 bacterium     0
253  3 bacterium     0
501  1     leech     1
502  2     leech     1
503  3     leech     0

Giving the following output:

    X ID  variable value concreteness familiarity typicality
1   1  1    amoeba     0         3.60        1.30       1.71
2   2  2    amoeba     0         3.60        1.30       1.71
3   3  3    amoeba    NA         3.60        1.30       1.71
4 251  1 bacterium     0         3.82        3.48       2.13
5 252  2 bacterium     0         3.82        3.48       2.13
6 253  3 bacterium     0         3.82        3.48       2.13
7 501  1     leech     1         5.71        1.83       4.50
8 502  2     leech     1         5.71        1.83       4.50
9 503  3     leech     0         5.71        1.83       4.50

As you can see the info from dat1 has to be replicated over several rows in dat2.

This was my failed attempt:

dat3 <- merge(dat1, dat2, by=intersect(dat1$variable(dat1), dat2$row.names(dat2)))

Givng the following error:

Error in as.vector(y) : attempt to apply non-function

Please find replicate examples here:

dat1:

structure(list(ID = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), variable = structure(c(1L, 
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), .Label = c("amoeba", "bacterium", 
"leech", "centipede", "lizard", "tapeworm", "head lice", "maggot", 
"ant", "moth", "mosquito", "earthworm", "caterpillar", "scorpion", 
"snail", "spider", "grasshopper", "dust mite", "tarantula", "termite", 
"bat", "wasp", "silkworm"), class = "factor"), value = c(0L, 
0L, NA, 0L, 0L, 0L, 1L, 1L, 0L)), .Names = c("ID", "variable", 
"value"), row.names = c(1L, 2L, 3L, 251L, 252L, 253L, 501L, 502L, 
503L), class = "data.frame")

dat2:

structure(list(concreteness = c(3.6, 3.82, 5.71), familiarity = c(1.3, 
3.48, 1.83), typicality = c(1.71, 2.13, 4.5)), .Names = c("concreteness", 
"familiarity", "typicality"), row.names = c("amoeba", "bacterium", 
"leech"), class = "data.frame")
like image 710
SarahDew Avatar asked Dec 31 '12 14:12

SarahDew


People also ask

How do I merge two Dataframes with different columns in pandas?

It is possible to join the different columns is using concat() method. DataFrame: It is dataframe name. axis: 0 refers to the row axis and1 refers the column axis. join: Type of join.


2 Answers

You could add a join variable to dat2 then using merge:

dat2$variable <- rownames(dat2)
merge(dat1, dat2)
   variable ID value concreteness familiarity typicality
1    amoeba  1     0         3.60        1.30       1.71
2    amoeba  2     0         3.60        1.30       1.71
3    amoeba  3    NA         3.60        1.30       1.71
4 bacterium  1     0         3.82        3.48       2.13
5 bacterium  2     0         3.82        3.48       2.13
6 bacterium  3     0         3.82        3.48       2.13
7     leech  1     1         5.71        1.83       4.50
8     leech  2     1         5.71        1.83       4.50
9     leech  3     0         5.71        1.83       4.50
like image 84
agstudy Avatar answered Sep 23 '22 15:09

agstudy


Try this:

merge(dat1, dat2, by.x = 2, by.y = 0, all.x = TRUE)

This assumes that if there are any rows in dat1 that are unmatched then the dat2 columns in the result should be filled with NA and if there are unmatched values in dat2 then they are disregarded. For example:

dat2a <- dat2
rownames(2a)[3] <- "elephant"
# the above still works:
merge(dat1, dat2a, by.x = 2, by.y = 0, all.x = TRUE)

The above is known as a left join in SQL and can be done like this in sqldf (ignore the warning):

library(sqldf)
sqldf("select * 
         from dat1 left join dat2 
         on dat1.variable = dat2.row_names", 
       row.names = TRUE)
like image 28
G. Grothendieck Avatar answered Sep 24 '22 15:09

G. Grothendieck