It seems like such a simple problem, yet i've been pulling my hair out trying to get this to work:
Given this data frame identifying the interactions id
had with contact
who is grouped by contactGrp
,
head(data)
id sesTs contact contactGrp relpos maxpos
1 6849 2012-06-25 15:58:34 peter west 0.000000 3
2 6849 2012-06-25 18:24:49 sarah south 0.500000 3
3 6849 2012-06-27 00:13:30 sarah south 1.000000 3
4 1235 2012-06-29 17:49:35 peter west 0.000000 2
5 1235 2012-06-29 23:56:35 peter west 1.000000 2
6 5893 2012-06-30 22:21:33 carl east 0.000000 1
how many contacts where there for unique(data$contactGrp)
with relpos=1
and maxpos>1
?
An expected Result would be:
1 west 1
2 south 1
3 east 0
A small subset of lines i have tried:
aggregate(data, by=list('contactGrp'), FUN=count)
yields an error, no filteringdata.table
seems to require a key, which is not unique in this data…ddply(data,"contactGrp",summarise,count=???)
not sure which function to use to fill the count
columnddply(subset(data,maxpos>1 & relpos==0), c('contactGrp'), function(df)count(df$relpos))
works but gives me an extra column x
and it feels like i've overcomplicated it…SQL would be easy: Select contactGrp, count(*) as cnt from data where … Group by contactGrp
but im trying to learn R
The GROUP BY statement is often used with aggregate functions ( COUNT() , MAX() , MIN() , SUM() , AVG() ) to group the result-set by one or more columns.
The most commonly used SQL aggregate functions include SUM, MAX, MIN, COUNT and AVERAGE. Aggregators are very often used in conjunction with Grouping functions in order to summarize the data.
How to Count Distinct Values in R?, using the n_distinct() function from dplyr, you can count the number of distinct values in an R data frame using one of the following methods. With the given data frame, the following examples explain how to apply each of these approaches in practice.
The Group By statement is used to group together any rows of a column with the same value stored in them, based on a function specified in the statement. Generally, these functions are one of the aggregate functions such as MAX() and SUM(). This statement is used with the SELECT command in SQL.
And here is the data.table
solution:
> library(data.table)
> dt <- data.table(sessions)
> dt[, length(contact[relpos == 0 & maxpos > 1]), by = contactGrp]
contactGrp V1
[1,] west 2
[2,] south 0
[3,] east 0
> dt[, length(contact[relpos == 1 & maxpos > 1]), by = contactGrp]
contactGrp V1
[1,] west 1
[2,] south 1
[3,] east 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With