I have the following data.frame
:
id name altNames
1001 Joan character(0)
1002 Jane c("Janie", "Janet", "Jan")
1003 John Jon
1004 Bill Will
1005 Tom character(0)
The column altNames
could be empty (i.e. character(0)), have just one name, or a list of names. What I want is a data.frame
(or a list) where each entry from name
and/or altNames
appears just once along with the corresponding id
, like this:
id name
1001 Joan
1002 Jane
1002 Janie
1002 Janet
1002 Jan
1003 John
1003 Jon
1004 Bill
1004 Will
1005 Tom
What's the most efficient way of doing it? Even better is dplyr
is utilized.
Thanks
Edit: Here's the data:
df <- data_frame(
id = c("1001", "1002","1003", "1004", "1005"),
name = c("Joan", "Jane", "John", "Bill", "Tom"),
altNames = list(character(0), c("Janie", "Janet", "Jan"), "Jon", "Will", character(0))
)
Here's a possible data.table
approach
library(data.table)
setDT(dat)[, .(name = c(name, unlist(altNames))), by = id]
# id name
# 1: 1001 Joan
# 2: 1002 Jane
# 3: 1002 Janie
# 4: 1002 Janet
# 5: 1002 Jan
# 6: 1003 John
# 7: 1003 Jon
# 8: 1004 Bill
# 9: 1004 Will
# 10: 1005 Tom
A base R version (using the df
added by @rawr)
with(df, {
ns <- mapply(c, name, altNames)
data.frame(id = rep(id, times=lengths(ns)), name=unlist(ns), row.names=NULL)
})
# id name
#1 1001 Joan
#2 1002 Jane
#3 1002 Janie
#4 1002 Janet
#5 1002 Jan
#6 1003 John
#7 1003 Jon
#8 1004 Bill
#9 1004 Will
#10 1005 Tom
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With