I want to add variables from <code>dat2</code>: <pre class="prettyprint"><code> concreteness familiarity typicality amoeba 3.60 1.30 1.71 bacterium 3.82 3.48 2.13 leech 5.71 1.83 4.50 </code></pre> To <code>dat1</code>: <pre class="prettyprint"><code> ID variable value 1 1 amoeba 0 2 2 amoeba 0 3 3 amoeba NA 251 1 bacterium 0 252 2 bacterium 0 253 3 bacterium 0 501 1 leech 1 502 2 leech 1 503 3 leech 0 </code></pre> Giving the following output: <pre class="prettyprint"><code> X ID variable value concreteness familiarity typicality 1 1 1 amoeba 0 3.60 1.30 1.71 2 2 2 amoeba 0 3.60 1.30 1.71 3 3 3 amoeba NA 3.60 1.30 1.71 4 251 1 bacterium 0 3.82 3.48 2.13 5 252 2 bacterium 0 3.82 3.48 2.13 6 253 3 bacterium 0 3.82 3.48 2.13 7 501 1 leech 1 5.71 1.83 4.50 8 502 2 leech 1 5.71 1.83 4.50 9 503 3 leech 0 5.71 1.83 4.50 </code></pre> As you can see the info from <code>dat1</code> has to be replicated over several rows in <code>dat2</code>. This was my failed attempt: <pre class="prettyprint"><code>dat3 <- merge(dat1, dat2, by=intersect(dat1$variable(dat1), dat2$row.names(dat2))) </code></pre> Givng the following error: <pre class="prettyprint"><code>Error in as.vector(y) : attempt to apply non-function </code></pre> Please find replicate examples here: dat1: <pre class="prettyprint"><code>structure(list(ID = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), variable = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), .Label = c("amoeba", "bacterium", "leech", "centipede", "lizard", "tapeworm", "head lice", "maggot", "ant", "moth", "mosquito", "earthworm", "caterpillar", "scorpion", "snail", "spider", "grasshopper", "dust mite", "tarantula", "termite", "bat", "wasp", "silkworm"), class = "factor"), value = c(0L, 0L, NA, 0L, 0L, 0L, 1L, 1L, 0L)), .Names = c("ID", "variable", "value"), row.names = c(1L, 2L, 3L, 251L, 252L, 253L, 501L, 502L, 503L), class = "data.frame") </code></pre> dat2: <pre class="prettyprint"><code>structure(list(concreteness = c(3.6, 3.82, 5.71), familiarity = c(1.3, 3.48, 1.83), typicality = c(1.71, 2.13, 4.5)), .Names = c("concreteness", "familiarity", "typicality"), row.names = c("amoeba", "bacterium", "leech"), class = "data.frame") </code></pre>

Try this: <pre class="prettyprint"><code>merge(dat1, dat2, by.x = 2, by.y = 0, all.x = TRUE) </code></pre> This assumes that if there are any rows in <code>dat1</code> that are unmatched then the <code>dat2</code> columns in the result should be filled with <code>NA</code> and if there are unmatched values in <code>dat2</code> then they are disregarded. For example: <pre class="prettyprint"><code>dat2a <- dat2 rownames(2a)[3] <- "elephant" # the above still works: merge(dat1, dat2a, by.x = 2, by.y = 0, all.x = TRUE) </code></pre> The above is known as a left join in SQL and can be done like this in sqldf (ignore the warning): <pre class="prettyprint"><code>library(sqldf) sqldf("select * from dat1 left join dat2 on dat1.variable = dat2.row_names", row.names = TRUE) </code></pre>

Merge dataframes, different lengths

Tags:

merge

r

I want to add variables from dat2:

          concreteness familiarity typicality
amoeba            3.60        1.30       1.71
bacterium         3.82        3.48       2.13
leech             5.71        1.83       4.50

To dat1:

    ID  variable value
1    1    amoeba     0
2    2    amoeba     0
3    3    amoeba    NA
251  1 bacterium     0
252  2 bacterium     0
253  3 bacterium     0
501  1     leech     1
502  2     leech     1
503  3     leech     0

Giving the following output:

    X ID  variable value concreteness familiarity typicality
1   1  1    amoeba     0         3.60        1.30       1.71
2   2  2    amoeba     0         3.60        1.30       1.71
3   3  3    amoeba    NA         3.60        1.30       1.71
4 251  1 bacterium     0         3.82        3.48       2.13
5 252  2 bacterium     0         3.82        3.48       2.13
6 253  3 bacterium     0         3.82        3.48       2.13
7 501  1     leech     1         5.71        1.83       4.50
8 502  2     leech     1         5.71        1.83       4.50
9 503  3     leech     0         5.71        1.83       4.50

As you can see the info from dat1 has to be replicated over several rows in dat2.

This was my failed attempt:

dat3 <- merge(dat1, dat2, by=intersect(dat1$variable(dat1), dat2$row.names(dat2)))

Givng the following error:

Error in as.vector(y) : attempt to apply non-function

Please find replicate examples here:

dat1:

structure(list(ID = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), variable = structure(c(1L, 
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), .Label = c("amoeba", "bacterium", 
"leech", "centipede", "lizard", "tapeworm", "head lice", "maggot", 
"ant", "moth", "mosquito", "earthworm", "caterpillar", "scorpion", 
"snail", "spider", "grasshopper", "dust mite", "tarantula", "termite", 
"bat", "wasp", "silkworm"), class = "factor"), value = c(0L, 
0L, NA, 0L, 0L, 0L, 1L, 1L, 0L)), .Names = c("ID", "variable", 
"value"), row.names = c(1L, 2L, 3L, 251L, 252L, 253L, 501L, 502L, 
503L), class = "data.frame")

dat2:

structure(list(concreteness = c(3.6, 3.82, 5.71), familiarity = c(1.3, 
3.48, 1.83), typicality = c(1.71, 2.13, 4.5)), .Names = c("concreteness", 
"familiarity", "typicality"), row.names = c("amoeba", "bacterium", 
"leech"), class = "data.frame")

710

asked Dec 31 '12 14:12

SarahDew

2 Answers

You could add a join variable to dat2 then using merge:

dat2$variable <- rownames(dat2)
merge(dat1, dat2)
   variable ID value concreteness familiarity typicality
1    amoeba  1     0         3.60        1.30       1.71
2    amoeba  2     0         3.60        1.30       1.71
3    amoeba  3    NA         3.60        1.30       1.71
4 bacterium  1     0         3.82        3.48       2.13
5 bacterium  2     0         3.82        3.48       2.13
6 bacterium  3     0         3.82        3.48       2.13
7     leech  1     1         5.71        1.83       4.50
8     leech  2     1         5.71        1.83       4.50
9     leech  3     0         5.71        1.83       4.50

answered Sep 23 '22 15:09

agstudy

Try this:

merge(dat1, dat2, by.x = 2, by.y = 0, all.x = TRUE)

This assumes that if there are any rows in dat1 that are unmatched then the dat2 columns in the result should be filled with NA and if there are unmatched values in dat2 then they are disregarded. For example:

dat2a <- dat2
rownames(2a)[3] <- "elephant"
# the above still works:
merge(dat1, dat2a, by.x = 2, by.y = 0, all.x = TRUE)

The above is known as a left join in SQL and can be done like this in sqldf (ignore the warning):

library(sqldf)
sqldf("select * 
         from dat1 left join dat2 
         on dat1.variable = dat2.row_names", 
       row.names = TRUE)

answered Sep 24 '22 15:09

G. Grothendieck

Related questions
                            
                                Add a transparent window/keyhole ggplot2 (grid)
                            
                                Using R to download zipped data file, extract, and import .csv
                            
                                .onLoad failed in loadNamespace() for 'rJava' when installing a package
                            
                                Create SpatialPointsDataframe
                            
                                Passing data within Shiny Modules from Module 1 to Module 2
                            
                                Decrease overal legend size (elements and text)
                            
                                Getting the state of variables after an error occurs in R
                            
                                An NA in subsetting a data.frame does something unexpected
                            
                                Intersection of lists in R
                            
                                generate markdown comments within for loop
                            
                                R function to return the license of a package?
                            
                                counting occurrences in data.frame in r
                            
                                Convert time from numeric to time format in R
                            
                                Constructing a named list without having to type each object's name twice [duplicate]
                            
                                How can I calculate the percentage change within a group for multiple columns in R?
                            
                                Removing elements from pandas series in python
                            
                                How to cite multiple papers in RMarkdown
                            
                                Rmarkdown setting the position of kable
                            
                                Reshaping data frame in R [duplicate]
                            
                                How to create a factor from a binary indicator matrix?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With