I have a df which has data like this:
sub = c("X001","X002", "X001","X003","X002","X001","X001","X003","X002","X003","X003","X002")
month = c("201506", "201507", "201506","201507","201507","201508", "201508","201507","201508","201508", "201508", "201508")
tech = c("mobile", "tablet", "PC","mobile","mobile","tablet", "PC","tablet","PC","PC", "mobile", "tablet")
brand = c("apple", "samsung", "dell","apple","samsung","apple", "samsung","dell","samsung","dell", "dell", "dell")
revenue = c(20, 15, 10,25,20,20, 17,9,14,12, 9, 11)
df = data.frame(sub, month, brand, tech, revenue)
I want to use sub and month as key and get one row for every subscriber per month which displays the sum of revenues for the unique values in tech and brand for that subscriber for that month. This example is simplified and with less columns as I have a huge data set I decided to try doing it with data.table
.
I have managed to do this for one catagorical column, either tech or brand using this:
df1 <- dcast(df, sub + month ~ tech, fun=sum, value.var = "revenue")
but I want to do it for two or more caqtogorical columns, so far I've tried this:
df2 <- dcast(df, sub + month ~ tech+brand, fun=sum, value.var = "revenue")
and it just concatenates the unique values of both catogorical columns and sums for that but I do not want that. I wan seperate columns for each unique value of all catogorical columns.
I'm new to R and would really appreciate any help.
(I will assume that df
is a data.table
rather a data.frame
like in your example.)
One possible solution for this is to first melt
the data while keeping sub
, month
and revenue
as keys. This way, brand
and tech
will be converted to a single variable with a value corresponding to each existing combination of the keys. This way we will be able to easily dcast
it back as we will be operating against a single column- like in your first example
dcast(melt(df, c(1:2, 5)), sub + month ~ value, sum, value.var = "revenue")
# sub month PC apple dell mobile samsung tablet
# 1: X001 201506 10 20 10 20 0 0
# 2: X001 201508 17 20 0 0 17 20
# 3: X002 201507 0 0 0 20 35 15
# 4: X002 201508 14 0 11 0 14 11
# 5: X003 201507 0 25 9 25 0 9
# 6: X003 201508 12 0 21 9 0 0
As per OPs comment, you can easily add a prefix by adding also variable
to the formula. This way, the column will be also ordered properly
dcast(melt(df, c(1:2, 5)), sub + month ~ variable + value, sum, value.var = "revenue")
# sub month brand_apple brand_dell brand_samsung tech_PC tech_mobile tech_tablet
# 1: X001 201506 20 10 0 10 20 0
# 2: X001 201508 20 0 17 17 0 20
# 3: X002 201507 0 0 35 0 20 15
# 4: X002 201508 0 11 14 14 0 11
# 5: X003 201507 25 9 0 0 25 9
# 6: X003 201508 0 21 0 12 9 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With