Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging data frames and combining columns into one

Tags:

r

I've got the following three dataframes:

df1 <- data.frame(name=c("John", "Anne", "Christine", "Andy"),
                  age=c(31, 26, 54, 48),
                  height=c(180, 175, 160, 168),
                  group=c("Student",3,5,"Employer"), stringsAsFactors=FALSE)

df2 <- data.frame(name=c("Anne", "Christine"),
                  age=c(26, 54),
                  height=c(175, 160),
                  group=c(3,5),
                  group2=c("Teacher",6), stringsAsFactors=FALSE)

df2 <- data.frame(name=c("Christine"),
                  age=c(54),
                  height=c(160),
                  group=c(5),
                  group2=c(6),
                  group3=c("Scientist"), stringsAsFactors=FALSE)

I'd like to combine them so that I get the following result:

df.all <- data.frame(name=c("John", "Anne", "Christine", "Andy"),
                     age=c(31, 26, 54, 48),
                     height=c(180, 175, 160, 168),
                     group=c("Student", "Teacher", "Scientist", "Employer"))

At the moment I'm doing it this way:

df.all <- merge(merge(df1[,c(1,4)], df2[,c(1,5)], all=TRUE, by="name"),
                df3[,c(1,6)], all=TRUE, by="name")
row.ind <- which(df.all$group %in% c(6,5))
df.all[row.ind, c("group")] <- df.all[row.ind, c("group2")]
row.ind2 <- which(df.all$group2 %in% c(6))
df.all[row.ind2, c("group")] <- df.all[row.ind2, c("group3")]

This isn't generalisable and it is really messy. Maybe there would be a way to use merge_all or merge_recurse for the merging step (especially as there might be more than two dataframes to be merged), but I haven't figured out how. These two don't produce the right result:

df.all <- merge_all(list(df1, df2, df3))
df.all <- merge_recurse(list(df1, df2, df3), by=c("name"))

Is there a more general and elegant way to solve this problem?

like image 641
AnjaM Avatar asked Jan 14 '23 18:01

AnjaM


1 Answers

Here is another possible approach, if I understand what you're ultimately after. (It is not clear what the numeric values in the "group" columns are, so I'm not sure this is exactly what you're looking for.)

Use Reduce() to merge your multiple data.frames.

temp <- Reduce(function(x, y) merge(x, y, all=TRUE), list(df1, df2, df3))
names(temp)[4] <- "group1" # Rename "group" to "group1" for reshaping 
temp
#        name age height   group1  group2    group3
# 1      Andy  48    168 Employer    <NA>      <NA>
# 2      Anne  26    175        3 Teacher      <NA>
# 3 Christine  54    160        5       6 Scientist
# 4      John  31    180  Student    <NA>      <NA>

Use reshape() to reshape your data from wide to long.

df.all <- reshape(temp, direction = "long", idvar="name", varying=4:6, sep="")
df.all
#                  name age height time     group
# Andy.1           Andy  48    168    1  Employer
# Anne.1           Anne  26    175    1         3
# Christine.1 Christine  54    160    1         5
# John.1           John  31    180    1   Student
# Andy.2           Andy  48    168    2      <NA>
# Anne.2           Anne  26    175    2   Teacher
# Christine.2 Christine  54    160    2         6
# John.2           John  31    180    2      <NA>
# Andy.3           Andy  48    168    3      <NA>
# Anne.3           Anne  26    175    3      <NA>
# Christine.3 Christine  54    160    3 Scientist
# John.3           John  31    180    3      <NA>

Take advantage of the fact that as.numeric() will coerce characters to NA, and use na.omit() to remove all of the rows with NA values.

na.omit(df.all[is.na(as.numeric(df.all$group)), ])
#                  name age height time     group
# Andy.1           Andy  48    168    1  Employer
# John.1           John  31    180    1   Student
# Anne.2           Anne  26    175    2   Teacher
# Christine.3 Christine  54    160    3 Scientist

Again, this might be over-generalizing your problem--there might be NA values in other columns, for example--but it might help direct you towards a solution to your problem.

like image 188
A5C1D2H2I1M1N2O1R2T1 Avatar answered Jan 20 '23 04:01

A5C1D2H2I1M1N2O1R2T1