I have a data frame with two columns. I want to add an additional two columns to the data set with counts based on aggregates.
df <- structure(list(ID = c(1045937900, 1045937900),
SMS.Type = c("DF1", "WCB14"),
SMS.Date = c("12/02/2015 19:51", "13/02/2015 08:38"),
Reply.Date = c("", "13/02/2015 09:52")
), row.names = 4286:4287, class = "data.frame")
I want to simply count the number of Instances of SMS.Type and Reply.Date where there is no null. So in the toy example below, i will generate the 2 for SMS.Type and 1 for Reply.Date
I then want to add this to the data frame as total counts (Im aware they will duplicate out for the number of rows in the original dataset but thats ok)
I have been playing around with aggregate and count function but to no avail
mytempdf <-aggregate(cbind(testtrain$SMS.Type,testtrain$Response.option)~testtrain$ID,
train,
function(x) length(unique(which(!is.na(x)))))
mytempdf <- aggregate(testtrain$Reply.Date~testtrain$ID,
testtrain,
function(x) length(which(!is.na(x))))
Can anyone help?
Thank you for your time
Using data.table
you could do (I've added a real NA
to your original data).
I'm also not sure if you really looking for length(unique())
or just length
?
library(data.table)
cols <- c("SMS.Type", "Reply.Date")
setDT(df)[, paste0(cols, ".count") :=
lapply(.SD, function(x) length(unique(na.omit(x)))),
.SDcols = cols,
by = ID]
# ID SMS.Type SMS.Date Reply.Date SMS.Type.count Reply.Date.count
# 1: 1045937900 DF1 12/02/2015 19:51 NA 2 1
# 2: 1045937900 WCB14 13/02/2015 08:38 13/02/2015 09:52 2 1
In the devel version (v >= 1.9.5) you also could use uniqueN
function
Explanation
This is a general solution which will work on any number of desired columns. All you need to do is to put the columns names into cols
.
lapply(.SD,
is calling a certain function over the columns specified in .SDcols = cols
paste0(cols, ".count")
creates new column names while adding count
to the column names specified in cols
:=
performs assignment by reference, meaning, updates the newly created columns with the output of lapply(.SD,
in place
by
argument is specifying the aggregator columnsAfter converting your empty strings to NAs:
library(dplyr)
mutate(df, SMS.Type.count = sum(!is.na(SMS.Type)),
Reply.Date.count = sum(!is.na(Reply.Date)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With