I want to aggregate two columns of a data frame by name, in the following somewhat special way:
parts
column in the result by specially aggregating the two columns fruits
and parts
parts
values for Apple, Banana and Strawberry doesn't matter and everything gets summarized, the parts
values of Grape and Kiwi should become the new fruits
nameThis may sound dead simple on the first sight, but after hours of trial and error I didn't find any useful solution. Here's the example:
theDF <- data.frame(dates = as.Date(c(today()+20)),
fruits = c("Apple","Apple","Apple","Apple","Banana","Banana","Banana","Banana",
"Strawberry","Strawberry","Strawberry","Strawberry","Grape", "Grape",
"Grape","Grape", "Kiwi","Kiwi","Kiwi","Kiwi"),
parts = c("Big Green Apple","Apple2","Blue Apple","XYZ Apple4",
"Yellow Banana1","Small Banana","Banana3","Banana4",
"Red Small Strawberry","Red StrawberryY","Big Strawberry",
"StrawberryZ","Green Grape", "Blue Grape", "Blue Grape",
"Blue Grape","Big Kiwi","Small Kiwi","Big Kiwi","Middle Kiwi"),
stock = as.vector(sample(1:20)) )
The current data frame:
The desired output:
We can use data.table
. If there are patterns like the end character is capital letter or a number in 'parts' column to be removed, we can use sub
to do that and use as a grouping variable along with 'dates' and get the sum
of the 'stock'.
library(data.table)
setDT(theDF)[,.(stock = sum(stock)) , .(dates, fruits = sub("([0-9]|[A-Z])$", "", parts))]
# dates fruits stock
#1: 2016-06-19 Apple 46
#2: 2016-06-19 Banana 35
#3: 2016-06-19 Strawberry 38
#4: 2016-06-19 Green Grape 12
#5: 2016-06-19 Blue Grape 21
#6: 2016-06-19 Big Kiwi 37
#7: 2016-06-19 Small Kiwi 14
#8: 2016-06-19 Middle Kiwi 7
Or using dplyr
, we can similarly implement the same methodology.
library(dplyr)
theDF %>%
group_by(dates, fruits = sub('([0-9]|[A-Z])$', '', parts)) %>%
summarise(stock = sum(stock))
If there are no patterns and only based on manually identifying the elements in 'fruits', create a vector
of elements, use %chin%
to get the logical index in 'i', assign (:=
) the values in 'parts' corresponding to the 'i' to 'fruits', then do the group by 'dates', 'fruits' and get the sum
of 'stock'.
setDT(theDF)[as.character(fruits) %chin% c("Grape", "Kiwi"),
fruits := parts][, .(stock = sum(stock)), .(dates, fruits)]
theDF <- structure(list(dates = structure(c(16971, 16971, 16971, 16971,
16971, 16971, 16971, 16971, 16971, 16971, 16971, 16971, 16971,
16971, 16971, 16971, 16971, 16971, 16971, 16971), class = "Date"),
fruits = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 5L,
5L, 5L, 5L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L), .Label = c("Apple",
"Banana", "Grape", "Kiwi", "Strawberry"), class = "factor"),
parts = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 14L,
15L, 16L, 16L, 11L, 10L, 10L, 10L, 9L, 13L, 9L, 12L), .Label = c("Apple1",
"Apple2", "Apple3", "Apple4", "Banana1", "Banana2", "Banana3",
"Banana4", "Big Kiwi", "Blue Grape", "Green Grape", "Middle Kiwi",
"Small Kiwi", "StrawberryX", "StrawberryY", "StrawberryZ"
), class = "factor"), stock = c(8, 19, 15, 4, 6, 18, 1, 10,
9, 16, 11, 2, 12, 13, 5, 3, 17, 14, 20, 7)), .Names = c("dates",
"fruits", "parts", "stock"), row.names = c(NA, -20L), class = "data.frame")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With