understanding apply and outer function in R

Question

Suppose i have a data which looks like this

I Wanted to compare these values with each other so if an ID has changed its value of A variable over a period of B variable(which is from 1 to 4) it goes into data frame K and if it hasn't then it goes to data frame L.

so in this data set K will look like

and L will look like

In terms of nested loops and if then else statement it can be solved like following

for ( i in 1:length(ID)){
m=0
for (j in 1: length(B)){
ifelse( A[j] == A[j+1],m,m=m+1)
}
ifelse(m=0,  L=c[,df[i]], K=c[,df[i]])
}

I have read in some posts that in R nested loops can be replaced by apply and outer function. if someone can help me understand how it can be used in such circumstances.

David Arenburg · Accepted Answer

So basically you don't need a loop with conditions here, all you need to do is to check if there's a variance (and then converting it to a logical using !) in A during each cycle of B (IDs) by converting A to a numeric value (I'm assuming its a factor in your real data set, if its not a factor, you can use FUN = function(x) length(unique(x)) within ave instead ) and then split accordingly. With base R we can use ave for such task, for example

indx <- !with(df, ave(as.numeric(A), ID , FUN = var))

Or (if A is a character rather a factor)

indx <- with(df, ave(A, ID , FUN = function(x) length(unique(x)))) == 1L

Then simply run split

split(df, indx)
# $`FALSE`
# ID A B  C
# 1  1 X 1 10
# 2  1 X 2 10
# 3  1 Z 3 15
# 4  1 Y 4 12
# 5  2 Y 1 15
# 6  2 X 2 13
# 7  2 X 3 13
# 8  2 Y 4 13
# 
# $`TRUE`
# ID A B  C
# 9   3 Y 1 16
# 10  3 Y 2 18
# 11  3 Y 3 19
# 12  3 Y 4 10

This will return a list with two data frames.

Similarly with data.table

library(data.table)
setDT(df)[, indx := !var(A), by = ID]
split(df, df$indx)

Or dplyr

library(dplyr)
df %>%
  group_by(ID) %>%
  mutate(indx = !var(A)) %>%
  split(., indx)

Ricky · Answer

Since you want to understand apply rather than simply getting it done, you can consider tapply. As a demonstration:

> tapply(df$A, df$ID, function(x) ifelse(length(unique(x))>1, "K", "L"))
  1   2   3 
"K" "K" "L"

In a bit plainer English: go through all df$A grouped by df$ID, and apply the function on df$A within each groupings (i.e. the x in the embedded function): if the number of unique values is more than 1, it's "K", otherwise it's "L".

understanding apply and outer function in R

Tags:

r

Jay khan

2 Answers

David Arenburg

Ricky

Recent Activity

Donate For Us

understanding apply and outer function in R

Tags:

r

Jay khan

2 Answers

David Arenburg

Ricky

Related questions

Recent Activity

Donate For Us