I want to aggregate two columns of a data frame by name, in the following somewhat special way:
parts column in the result by specially aggregating the two columns fruits and parts
parts values for Apple, Banana and Strawberry doesn't matter and everything gets summarized, the parts values of Grape and Kiwi should become the new fruits nameThis may sound dead simple on the first sight, but after hours of trial and error I didn't find any useful solution. Here's the example:
theDF <- data.frame(dates = as.Date(c(today()+20)),
fruits = c("Apple","Apple","Apple","Apple","Banana","Banana","Banana","Banana",
"Strawberry","Strawberry","Strawberry","Strawberry","Grape", "Grape",
"Grape","Grape", "Kiwi","Kiwi","Kiwi","Kiwi"),
parts = c("Big Green Apple","Apple2","Blue Apple","XYZ Apple4",
"Yellow Banana1","Small Banana","Banana3","Banana4",
"Red Small Strawberry","Red StrawberryY","Big Strawberry",
"StrawberryZ","Green Grape", "Blue Grape", "Blue Grape",
"Blue Grape","Big Kiwi","Small Kiwi","Big Kiwi","Middle Kiwi"),
stock = as.vector(sample(1:20)) )
The current data frame:

The desired output:

We can use data.table. If there are patterns like the end character is capital letter or a number in 'parts' column to be removed, we can use sub to do that and use as a grouping variable along with 'dates' and get the sum of the 'stock'.
library(data.table)
setDT(theDF)[,.(stock = sum(stock)) , .(dates, fruits = sub("([0-9]|[A-Z])$", "", parts))]
# dates fruits stock
#1: 2016-06-19 Apple 46
#2: 2016-06-19 Banana 35
#3: 2016-06-19 Strawberry 38
#4: 2016-06-19 Green Grape 12
#5: 2016-06-19 Blue Grape 21
#6: 2016-06-19 Big Kiwi 37
#7: 2016-06-19 Small Kiwi 14
#8: 2016-06-19 Middle Kiwi 7
Or using dplyr, we can similarly implement the same methodology.
library(dplyr)
theDF %>%
group_by(dates, fruits = sub('([0-9]|[A-Z])$', '', parts)) %>%
summarise(stock = sum(stock))
If there are no patterns and only based on manually identifying the elements in 'fruits', create a vector of elements, use %chin% to get the logical index in 'i', assign (:=) the values in 'parts' corresponding to the 'i' to 'fruits', then do the group by 'dates', 'fruits' and get the sum of 'stock'.
setDT(theDF)[as.character(fruits) %chin% c("Grape", "Kiwi"),
fruits := parts][, .(stock = sum(stock)), .(dates, fruits)]
theDF <- structure(list(dates = structure(c(16971, 16971, 16971, 16971,
16971, 16971, 16971, 16971, 16971, 16971, 16971, 16971, 16971,
16971, 16971, 16971, 16971, 16971, 16971, 16971), class = "Date"),
fruits = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 5L,
5L, 5L, 5L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L), .Label = c("Apple",
"Banana", "Grape", "Kiwi", "Strawberry"), class = "factor"),
parts = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 14L,
15L, 16L, 16L, 11L, 10L, 10L, 10L, 9L, 13L, 9L, 12L), .Label = c("Apple1",
"Apple2", "Apple3", "Apple4", "Banana1", "Banana2", "Banana3",
"Banana4", "Big Kiwi", "Blue Grape", "Green Grape", "Middle Kiwi",
"Small Kiwi", "StrawberryX", "StrawberryY", "StrawberryZ"
), class = "factor"), stock = c(8, 19, 15, 4, 6, 18, 1, 10,
9, 16, 11, 2, 12, 13, 5, 3, 17, 14, 20, 7)), .Names = c("dates",
"fruits", "parts", "stock"), row.names = c(NA, -20L), class = "data.frame")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With