Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combine duplicate rows in dataframe and create new columns

Tags:

dataframe

r

I am trying to aggregate rows in dataframe that have some values similar and others different as below :

  dataframe1 <- data.frame(Company_Name = c("KFC", "KFC", "KFC", "McD", "McD"), 
                        Company_ID = c(1, 1, 1, 2, 2),
                        Company_Phone = c("237389", "-", "-", "237002", "-"),
                       Employee_Name = c("John", "Mary", "Jane", "Joshua", 
                     "Anne"),
                     Employee_ID = c(1001, 1002, 1003, 2001, 2002))

I wish to combine the rows for the values that are similar and creating new columns for the values that are different as below:

   dataframe2 <- data.frame(Company_Name = c("KFC", "McD"), 
                     Company_ID = c(1,  2),
                     Company_Phone = c("237389", "237002"),
                     Employee_Name1 = c("John", "Joshua" ),
                     Employee_ID1 = c(1001, 2001),
                     Employee_Name2 = c("Mary", "Anne"),
                     Employee_ID2 = c(1002, 2002),
                     Employee_Name3 = c("Jane", "na"),
                     Employee_ID3 = c(1003, "na"))

I have checked similar questions such as this Combining duplicated rows in R and adding new column containing IDs of duplicates and R: collapse rows and then convert row into a new column but I do not wish to sepoarate the values by commas but rather create new columns.

 # Company_Name Company_ID Company_Phone Employee_Name1 Employee_ID1 Employee_Name2 Employee_ID2 Employee_Name3 Employee_ID3
 #1          KFC          1        237389           John         1001           Mary         1002           Jane         1003
 #2          McD          2        237002         Joshua         2001           Anne         2002             na           na

Thank you in advance.

like image 894
R noob Avatar asked Dec 24 '22 08:12

R noob


1 Answers

A solution using tidyverse. dat is the final output.

library(tidyverse)

dat <- dataframe1 %>%
  mutate_if(is.factor, as.character) %>%
  mutate(Company_Phone = ifelse(Company_Phone %in% "-", NA, Company_Phone)) %>%
  fill(Company_Phone) %>%
  group_by(Company_ID) %>%
  mutate(ID = 1:n()) %>%
  gather(Info, Value, starts_with("Employee_")) %>%
  unite(New_Col, Info, ID, sep = "") %>%
  spread(New_Col, Value) %>%
  select(c("Company_Name", "Company_ID", "Company_Phone",
           paste0(rep(c("Employee_ID", "Employee_Name"), 3), rep(1:3, each = 2)))) %>%
  ungroup()

# View the result
dat %>% as.data.frame(stringsAsFactors = FALSE)
#   Company_Name Company_ID Company_Phone Employee_ID1 Employee_Name1 Employee_ID2 Employee_Name2 Employee_ID3 Employee_Name3
# 1          KFC          1        237389         1001           John         1002           Mary         1003           Jane
# 2          McD          2        237002         2001         Joshua         2002           Anne         <NA>           <NA>
like image 52
www Avatar answered Apr 12 '23 22:04

www