<h3>I have the following data.frames:</h3> <pre class="prettyprint"><code>a <- data.frame(id = 1:3, v1 = c('a', NA, NA), v2 = c(NA, 'b', 'c')) b <- data.frame(id = 1:3, v1 = c(NA, 'B', 'C'), v2 = c("A", NA, NA)) > a id v1 v2 1 1 a <NA> 2 2 <NA> b 3 3 <NA> c > b id v1 v2 1 1 <NA> A 2 2 B <NA> 3 3 C <NA> </code></pre> <p><em><strong>note:</strong> There are no ids for which v1 or v2 are defined in both tables; there is only a single unique non-NA value in each column for each id value</em></p> <h3>I would like to merge these data frames on matching values of "id':</h3> <pre class="prettyprint"><code>ab <- merge(a, b, by = "id") </code></pre> <h3>but I would also like to combine the two columns <code>v1</code> and <code>v2</code>, so that the data.frame <code>ab</code> will look like this:</h3> <pre class="prettyprint"><code>ab <- data.frame(id = 1:3, v1 = c("a", "B", "C"), v2 = c("A", "b", "c")) > ab id v1 v2 1 1 a A 2 2 B b 3 3 C c </code></pre> <h3>instead, I get this:</h3> <pre class="prettyprint"><code>> merge(a, b, by = "id") id v1.x v2.x v1.y v2.y 1 1 a <NA> <NA> A 2 2 <NA> b B <NA> 3 3 <NA> c C <NA> </code></pre> <h3>it would be helpful to have examples using both <code>data.frame</code> and <code>data.table</code>, so here are the data.table versions of above:</h3> <pre class="prettyprint"><code>A <- data.table(a, key = 'id') B <- data.table(b, key = 'id') A[B] </code></pre>

<p>The type of merge you specify probably won't be possible using <code>merge</code> (with data frames), although saying that usually invites being proved wrong.</p> <p>You also omit some details: will there always be a single unique non-<code>NA</code> value in each column for each <code>id</code> value? If so, this will work:</p> <pre class="prettyprint"><code>ab <- rbind(a,b) > colFun <- function(x){x[which(!is.na(x))]} > ddply(ab,.(id),function(x){colwise(colFun)(x)}) id v1 v2 1 1 a A 2 2 B b 3 3 C c </code></pre> <p>A similar strategy should work with <code>data.table</code>s as well:</p> <pre class="prettyprint"><code>abDT <- data.table(ab,key = "id") > abDT[,list(colFun(v1),colFun(v2)),by = id] id V1 V2 [1,] 1 a A [2,] 2 B b [3,] 3 C c </code></pre>

join matching columns in a data.frame or data.table

I have the following data.frames:

a <- data.frame(id = 1:3, v1 = c('a', NA, NA), v2 = c(NA, 'b', 'c'))
b <- data.frame(id = 1:3, v1 = c(NA, 'B', 'C'), v2 = c("A", NA, NA))
> a
  id   v1   v2
1  1    a <NA>
2  2 <NA>    b
3  3 <NA>    c
> b
  id   v1   v2
1  1 <NA>    A
2  2    B <NA>
3  3    C <NA>

note: There are no ids for which v1 or v2 are defined in both tables; there is only a single unique non-NA value in each column for each id value

I would like to merge these data frames on matching values of "id':

ab <- merge(a, b, by = "id")

but I would also like to combine the two columns `v1` and `v2`, so that the data.frame `ab` will look like this:

ab <- data.frame(id = 1:3, v1 = c("a", "B", "C"), v2 = c("A", "b", "c"))

> ab
  id v1 v2
1  1  a  A
2  2  B  b
3  3  C  c

instead, I get this:

> merge(a, b, by = "id")
  id v1.x v2.x v1.y v2.y
1  1    a <NA> <NA>    A
2  2 <NA>    b    B <NA>
3  3 <NA>    c    C <NA>

it would be helpful to have examples using both `data.frame` and `data.table`, so here are the data.table versions of above:

A <- data.table(a, key = 'id')
B <- data.table(b, key = 'id')
A[B]

383

asked Mar 29 '12 02:03

David LeBauer

2 Answers

The type of merge you specify probably won't be possible using merge (with data frames), although saying that usually invites being proved wrong.

You also omit some details: will there always be a single unique non-NA value in each column for each id value? If so, this will work:

ab <- rbind(a,b)
> colFun <- function(x){x[which(!is.na(x))]}
> ddply(ab,.(id),function(x){colwise(colFun)(x)})
  id v1 v2
1  1  a  A
2  2  B  b
3  3  C  c

A similar strategy should work with data.tables as well:

abDT <- data.table(ab,key = "id")
> abDT[,list(colFun(v1),colFun(v2)),by = id]
     id V1 V2
[1,]  1  a  A
[2,]  2  B  b
[3,]  3  C  c

197

answered Oct 04 '22 22:10

joran

If your data is as simple as it is above joran's answer is likely the simplest way. Here's may approach in base:

a <- data.frame(id = 1:3, v1 = c('a', NA, NA), v2 = c(NA, 'b', 'c'))
b <- data.frame(id = 1:3, v1 = c(NA, 'B', 'C'), v2 = c("A", NA, NA))

decider <- function(x, y) factor(ifelse(is.na(x), as.character(y), as.character(x)))
data.frame(mapply(a, b, FUN = decider))

If your data has different id's (some overlap and some do not, then here's a different approach:

a <- data.frame(id = c(1,2,4,5), v1 = c('a', NA, "q", NA), v2 = c(NA, 'b', 'c', "e"))
b <- data.frame(id = 1:4, v1 = c(NA, "A", "C", 'B'), v2 = c("A", NA, "D", NA))

decider <- function(x, y) factor(ifelse(is.na(x), as.character(y), as.character(x)))

DF <- data.frame(mapply(a, b, FUN = decider))
DF2 <- rbind(b[!b$id %in% DF$id , ], DF)
DF2 <- DF2[order(DF2$id), ]
rownames(DF2) <- 1:nrow(DF2)

answered Oct 04 '22 23:10

Tyler Rinker

Related questions
                            
                                Error when trying to write DataFrame to feather. Does feather support list columns?
                            
                                R regex - extract words beginning with @ symbol
                            
                                Filtering a vector on condition
                            
                                A Regex to remove digits except for words starting with #
                            
                                How to pipe SQL into R's dplyr?
                            
                                Triple exclamation marks on R
                            
                                How to summarize the top n values across multiple columns row wise?
                            
                                Plotting predefined density functions using ggplot and R
                            
                                How do I highlight an observation's bin in a histogram in R
                            
                                How do I Sweave a multiple-file project?
                            
                                IDE / setup for package development with C++ code integrated
                            
                                How to supply file names with paths to R's read.table function?
                            
                                In R what are the common cases of this error: "Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'character'"
                            
                                histogram without vertical lines
                            
                                Read table with separator = k white space with k variable
                            
                                How to ddply() without sorting?
                            
                                How to pass a function and its arguments through a wrapper function in R? Similar to *args and *kwargs in python
                            
                                Can I nest parallel:::parLapply()?
                            
                                Prevent NA from being used in a lm regresion
                            
                                R paste: ignore sep if argument is an empty string

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

join matching columns in a data.frame or data.table

Tags:

merge

dataframe

r

data.table

plyr

I have the following data.frames:

I would like to merge these data frames on matching values of "id':

but I would also like to combine the two columns `v1` and `v2`, so that the data.frame `ab` will look like this:

instead, I get this:

it would be helpful to have examples using both `data.frame` and `data.table`, so here are the data.table versions of above:

David LeBauer

People also ask

2 Answers

joran

Tyler Rinker

Recent Activity

Donate For Us

join matching columns in a data.frame or data.table

Tags:

merge

dataframe

r

data.table

plyr

I have the following data.frames:

I would like to merge these data frames on matching values of "id':

but I would also like to combine the two columns v1 and v2, so that the data.frame ab will look like this:

instead, I get this:

it would be helpful to have examples using both data.frame and data.table, so here are the data.table versions of above:

David LeBauer

People also ask

2 Answers

joran

Tyler Rinker

Related questions

Recent Activity

Donate For Us

but I would also like to combine the two columns `v1` and `v2`, so that the data.frame `ab` will look like this:

it would be helpful to have examples using both `data.frame` and `data.table`, so here are the data.table versions of above: