Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trying to merge two data frames without using merge() and using union(), match() or %in%

Tags:

r

I want to construct two data frames and merge them without using any form of merge(). Instead I need to use set operations union() and match() or %in% operator. The following output must display the content of d1,d2 and the result of merging d1 and d2.

I have figured out how to do this with merge() but I cannot find out how to do it using union() and match() or %in% operator. Or any other way of doing this. Also my output doesn't match what the output should be. Im a beginner thanks for your help.

d1.Kids <- c("Jack", "Jill", "Jillian", "John", "James")
d1.States <- c("CA", "MA", "DE", "HI", "PA")

d1 <- data.frame(d1.Kids, d1.States, stringsAsFactors = FALSE)

d2.Ages <- c(10, 7, 12, 30)
d2.Kids <- c("Jill", "Jillian", "Jack", "Mary")

d2 <- data.frame(d2.Ages, d2.Kids, stringsAsFactors = FALSE)

# Merging two created data frame
merge <- merge(d1, d2, by.x = "d1.Kids", by.y = "d2.Kids", all = TRUE)

print(merge)

Output should be:

  kids    ages states 
1 Jack    12   CA
2 Jill    10   MA
3 Jillian 7    DE
4 John    NA   HI
5 James   NA   PA
6 Mary    30   NA
like image 951
Pat8 Avatar asked Jun 22 '19 15:06

Pat8


People also ask

How do I merge two DataFrames together?

The concat() function in pandas is used to append either columns or rows from one DataFrame to another. The concat() function does all the heavy lifting of performing concatenation operations along an axis while performing optional set logic (union or intersection) of the indexes (if any) on the other axes.

Which function is used for combining 2 DataFrames?

Pandas DataFrame merge() function is used to merge two DataFrame objects with a database-style join operation. The joining is performed on columns or indexes. If the joining is done on columns, indexes are ignored.

Which method is used to merge two DataFrames in Pandas?

We can use the concat function in pandas to append either columns or rows from one DataFrame to another. Let's grab two subsets of our data to see how this works. When we concatenate DataFrames, we need to specify the axis. axis=0 tells pandas to stack the second DataFrame UNDER the first one.

What are the different types of joins when merging two data frames?

one-to-one joins: for example when joining two DataFrame objects on their indexes (which must contain unique values). many-to-one joins: for example when joining an index (unique) to one or more columns in a different DataFrame . many-to-many joins: joining columns on columns.


2 Answers

Something like this will do what the question asks for.
It seems long but in fact it's the same set of instructions for each of the dataframes to be merged.

Kids <- union(d1$d1.Kids, d2$d2.Kids)

States <- rep(NA_character_, length(Kids))
Ages <- rep(NA_real_, length(Kids))

States[match(d1$d1.Kids, Kids)] <- as.character(d1$d1.States)
Ages[match(d2$d2.Kids, Kids)] <- d2$d2.Ages

mrg <- data.frame(Kids, States, Ages)

mrg
#     Kids States Ages
#1    Jack     CA   12
#2    Jill     MA   10
#3 Jillian     DE    7
#4    John     HI   NA
#5   James     PA   NA
#6    Mary   <NA>   30
like image 80
Rui Barradas Avatar answered Oct 15 '22 07:10

Rui Barradas


Using base R:

kids <- unique(c(d1$Kids, d2$Kids))
d3 <- data.frame("Kids" = kids, "ages" = NA, "states" = NA)
for (i in seq_along(kids)) {
if (any(d2$Kids == kids[i])) {
d3[which(d3$Kids == kids[i]),]$ages <- d2[which(d2$Kids == kids[i]),]$ages
} 
if (any(d1$Kids == kids[i])) {
d3[which(d1$Kids == kids[i]),]$states <- d1[which(d2$Kids == kids[i]),]$states
}
}
like image 1
Dij Avatar answered Oct 15 '22 07:10

Dij