I have a data frame with two columns. I want to add an additional two columns to the data set with counts based on aggregates.
df <- structure(list(ID = c(1045937900, 1045937900), 
SMS.Type = c("DF1", "WCB14"), 
SMS.Date = c("12/02/2015 19:51", "13/02/2015 08:38"), 
Reply.Date = c("", "13/02/2015 09:52")
), row.names = 4286:4287, class = "data.frame")
I want to simply count the number of Instances of SMS.Type and Reply.Date where there is no null. So in the toy example below, i will generate the 2 for SMS.Type and 1 for Reply.Date
I then want to add this to the data frame as total counts (Im aware they will duplicate out for the number of rows in the original dataset but thats ok)
I have been playing around with aggregate and count function but to no avail
mytempdf <-aggregate(cbind(testtrain$SMS.Type,testtrain$Response.option)~testtrain$ID,
                  train, 
                  function(x) length(unique(which(!is.na(x)))))
mytempdf <- aggregate(testtrain$Reply.Date~testtrain$ID,
                  testtrain, 
                  function(x) length(which(!is.na(x))))
Can anyone help?
Thank you for your time
Using data.table you could do (I've added a real NA to your original data).
I'm also not sure if you really looking for length(unique()) or just length?
library(data.table)
cols <- c("SMS.Type", "Reply.Date")
setDT(df)[, paste0(cols, ".count") := 
                  lapply(.SD, function(x) length(unique(na.omit(x)))), 
                  .SDcols = cols, 
            by = ID]
#            ID SMS.Type         SMS.Date       Reply.Date SMS.Type.count Reply.Date.count
# 1: 1045937900      DF1 12/02/2015 19:51               NA              2                1
# 2: 1045937900    WCB14 13/02/2015 08:38 13/02/2015 09:52              2                1
In the devel version (v >= 1.9.5) you also could use uniqueN function
Explanation
This is a general solution which will work on any number of desired columns. All you need to do is to put the columns names into cols. 
lapply(.SD, is calling a certain function over the columns specified in .SDcols = cols
paste0(cols, ".count") creates new column names while adding count to the column names specified in cols
:= performs assignment by reference, meaning, updates the newly created columns with the output of lapply(.SD, in place
by argument is specifying the aggregator columnsAfter converting your empty strings to NAs:
library(dplyr)
mutate(df, SMS.Type.count   = sum(!is.na(SMS.Type)),
           Reply.Date.count = sum(!is.na(Reply.Date)))
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With