If I have a dataframe with a key column and data columns, like this
df <- cbind(key=c("Jane", "Jane", "Sam", "Sam", "Mary"), var1=c("a", NA, "a", "a", "c"), var2=c(NA, "b", NA, "b", "d"))
key var1 var2
"Jane" "a" NA
"Jane" NA "b"
"Sam" "a" NA
"Sam" "a" "b"
"Mary" "c" "d"
"Mary" "c" NA
And want a dataframe that merges the rows by name, overwriting NAs whenever possible, like so
key var1 var2
"Jane" "a" "b"
"Sam" "a" "b"
"Mary" "c" "d"
How can I do this?
We can use the concat function in pandas to append either columns or rows from one DataFrame to another. Let's grab two subsets of our data to see how this works. When we concatenate DataFrames, we need to specify the axis. axis=0 tells pandas to stack the second DataFrame UNDER the first one.
Pandas DataFrame merge() function is used to merge two DataFrame objects with a database-style join operation. The joining is performed on columns or indexes. If the joining is done on columns, indexes are ignored. This function returns a new DataFrame and the source DataFrame objects are unchanged.
Merging Dataframes by index of both the dataframes As both the dataframe contains similar IDs on the index. So, to merge the dataframe on indices pass the left_index & right_index arguments as True i.e. Both the dataframes are merged on index using default Inner Join.
library(data.table)
dtt <- as.data.table(df)
dtt[, list(var1=unique(var1[!is.na(var1)])
, var2=unique(var2[!is.na(var2)]))
, by=key]
key var1 var2
1: Jane a b
2: Mary c d
3: Sam a b
Here's a solution using dplyr
. Note that cbind()
creates matrices, not data frames, so I've modified the code to do what I think you meant. I also pulled out the selection algorithm into a separate function. I think this is good practice because it allows you to change your algorithm in one place if you discover you need something different.
df <- data.frame(
key = c("Jane", "Jane", "Sam", "Sam", "Mary"),
var1 = c("a", NA, "a", "a", "c"),
var2 = c(NA, "b", NA, "b", "d"),
stringsAsFactors = FALSE
)
library(dplyr)
collapse <- function(x) x[!is.na(x)][1]
df %.%
group_by(key) %.%
summarise(var1 = collapse(var1), var2 = collapse(var2))
# Source: local data frame [3 x 3]
#
# key var1 var2
# 1 Mary c d
# 2 Sam a b
# 3 Jane a b
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With