Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count based on multiple conditions from other data.frame

Tags:

dataframe

r

I am migrating analysis from Excel to R, and would like some input on how best to perform something similar to Excel's COUNTIFS in R.

I have a two data.frames, statedf and memberdf.

statedf=data.frame(state=c('MD','MD','MD','NY','NY','NY'), week = 5:7) 
memberdf=data.frame(memID = 1:15, state = c('MD','MD','NY','NY','MD'),
              finalweek = c(3,3,5,3,3,5,3,5,3,5,6,5,2,3,5),
              orders = c(1,2,3))

This data is for a subscription-based business. I would like to know the number of members who newly lapsed for each week/state combo in statedf, where newly lapse is defined by statedf$week - 1 = memberdf$finalweek. Further I would like to have separate counts for each order value (1,2,3).

The desired output would look like

out <- data.frame(state=c('MD','MD','MD','NY','NY','NY'), week = 5:7,
               oneorder = c(0,1,0,0,0,0),
               twoorder = c(0,0,1,0,1,0),
               threeorder = c(0,3,0,0,1,0))

I asked (and got a great response for) a simpler version of this question yesterday - the answers revolved around creating a new data.frame based on member.df. However, I need to append the data to statedf, because statedf has member/week combos that don't exist in memberdf, and vice versa. If this was in Excel, I'd use COUNTIFS but am struggling for a solution in R.

Thanks.

like image 248
SFuj Avatar asked Apr 21 '15 13:04

SFuj


2 Answers

Here is a solution with the dplyr and tidyr packages:

library(tidyr) ; library(dplyr)

counts <- memberdf %>%
  mutate(lapsedweek = finalweek + 1) %>%
  group_by(state, lapsedweek, orders) %>%
  tally()
counts <- counts %>% spread(orders, n, fill = 0)    
out <- left_join(statedf, counts, by = c("state", "week" = "lapsedweek"))
out[is.na(out)] <- 0 # convert rows with all NAs to 0s
names(out)[3:5] <- paste0("order", names(out)[3:5]) # rename columns
like image 174
Sam Firke Avatar answered Sep 21 '22 00:09

Sam Firke


We could create a new variable ('week1') in the 'statedf' dataset, merge the 'memberdf' with 'statedf', and then reshape from 'long' to 'wide' format with dcast. I changed the 'orders' column to match the column names in the 'out'.

statedf$week1 <-  statedf$week-1
df1 <- merge(memberdf[-1], statedf, by.x=c('state', 'finalweek'), 
                 by.y=c('state', 'week1'), all.y=TRUE)
lvls <- paste0(c('one', 'two', 'three'), 'order')
df1$orders <- factor(lvls[df1$orders],levels=lvls) 
library(reshape2)
out1 <- dcast(df1, state+week~orders, value.var='orders', length)[-6]
out1
#     state week oneorder twoorder threeorder
#1    MD    5        0        0          0
#2    MD    6        1        0          3
#3    MD    7        0        1          0
#4    NY    5        0        0          0
#5    NY    6        0        1          1
#6    NY    7        0        0          0

all.equal(out, out1)
#[1] TRUE
like image 42
akrun Avatar answered Sep 19 '22 00:09

akrun