Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge rows with equal and unequal data

Tags:

r

reshape

I am struggling to merge some messy data.

I have a single data frame like this:

df <- data.frame(name = c("A", "A", "B", "B", "C", "C"), 
                 number = c(1, 1, 2, 2, 3, 3), 
                 product = c("fixed", "variable", "aggregate", "variable", "fixed", "fixed"), 
                 vol = c(1, 9, 2, 6, 4, 7)
                 )

Here is what I am working towards:

result <- data.frame(name = c("A", "B", "C"), 
                     number = c(1, 2, 3), 
                     new_product = c("fixed variable", "aggregate variable", "fixed"), 
                     vol = c(10, 8, 11) 
                     )

My problem is I need to merge all the equal rows in the data-frame. And if they are not unique I need to merge them into a name like the one from results.

I've tried a dplyr, but in dplyr I can't get the new_product to merge in any meaningful way because I cant reference the same column again.

df %>% group_by(name) %>% summarize (name = name, 
number = number, 
newproduct = paste(product, product) # ???? 

Any help much appreciated!

like image 675
Thorst Avatar asked Apr 21 '15 13:04

Thorst


3 Answers

Here's how I would approach this using data.table, though I'm not sure about how you defined number

library(data.table)
result <- setDT(df)[,.(new_product = toString(unique(product)), vol = sum(vol)), by = name]
result[, number := .I]
result
#    name         new_product vol number
# 1:    A     fixed, variable  10      1
# 2:    B aggregate, variable   8      2
# 3:    C               fixed  11      3

Note: You can use paste(unique(product), collapse = " ") instead of toString if you like the output better.

Or similarly with dplyr

df %>% 
  group_by(name) %>% 
  summarise(new_product = toString(unique(product)), vol=sum(vol)) %>% 
  mutate(number = row_number())
like image 54
David Arenburg Avatar answered Nov 15 '22 00:11

David Arenburg


Here are two more purely base ways:

df <- data.frame(name = c("A", "A", "B", "B", "C", "C"), 
                 number = rep(1:3, times = 2, each = 1), 
                 product = c("fixed", "variable", "aggregate", "variable", "fixed", "fixed"), 
                 vol = c(1, 9, 2, 6, 4, 7)
)
  1. this one is just using ave to act on the original data frame and then remove the duplicates

within(df, {
  new_product <- ave(seq_along(name), name, FUN = function(x) 
    toString(unique(df[x, 'product'])))
  vol <- ave(vol, name, FUN = sum)
  product <- NULL
})[!duplicated(df$name), ]

#   name number vol         new_product
# 1    A      1  10     fixed, variable
# 3    B      3   8 aggregate, variable
# 5    C      2  11               fixed
  1. this one is a more round-about way, creating the new_product with aggregate and then matching back to the original, and finally using aggregate again to get the sum by group

(tmp <- aggregate(product ~ name, df, function(x)
  paste0(unique(x), collapse = ' ')))
#   name            product
# 1    A     fixed variable
# 2    B aggregate variable
# 3    C              fixed

df$new_product <- tmp[match(df$name, tmp$name), 'product']
res <- aggregate(vol ~ name + new_product, df, sum)
within(res[order(res$name), ], {
  number <- 1:nrow(res)
})

#   name        new_product vol number
# 3    A     fixed variable  10      1
# 1    B aggregate variable   8      2
# 2    C              fixed  11      3
like image 42
rawr Avatar answered Nov 15 '22 00:11

rawr


Other people replied already, but here my solution:

df %>% 
  group_by (name) %>%
  summarise(
    new_product = paste (unique(product), collapse=" "),
    vol = sum(vol)) %>%
  mutate(number = row_number()) %>%
  select(name, number, new_product, vol)
like image 40
mucio Avatar answered Nov 15 '22 00:11

mucio