I'm trying to figure out how to use merge()
to update a data frame.
Take for example the data frame foo
foo <- data.frame(index=c('a', 'b', 'c', 'd'), value=c(100, 101, NA, NA))
Which has the following values
index value 1 a 100 2 b 101 3 c NA 4 d NA
And the data frame bar
bar <- data.frame(index=c('c', 'd'), value=c(200, 201))
Which has the following values:
index value 1 c 200 2 d 201
When I run the following merge()
function to update the values for c
and d
merge(foo, bar, by='index', all=T)
It results in this output:
index value.x value.y 1 a 100 NA 2 b 101 NA 3 c NA 200 4 d NA 201
I'd like the output of merge()
to avoid the creation of, in this specific example, of value.x
and value.y
but only retain the original column of value
Is there a simple way of doing this?
join function combines DataFrames based on index or column. Joining two DataFrames can be done in multiple ways (left, right, and inner) depending on what data must be in the final DataFrame.
In this article, I have listed the three best and most time-saving ways to combine multiple datasets using Python pandas methods. merge(): To combine the datasets on common column or index or both. concat(): To combine the datasets across rows or columns. join(): To combine the datasets on key column or index.
To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order. If data frameA has variables that data frameB does not, then either: Delete the extra variables in data frameA or.
Doesn't merge()
always bind columns together? Does replace()
work?
foo$value <- replace(foo$value, foo$index %in% bar$index, bar$value)
or match()
so the order matters
foo$value[match(bar$index, foo$index)] <- bar$value
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With