Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

understanding apply and outer function in R

Tags:

r

Suppose i have a data which looks like this

ID A B C 
1  X 1 10
1  X 2 10
1  Z 3 15
1  Y 4 12
2  Y 1 15
2  X 2 13
2  X 3 13
2  Y 4 13
3  Y 1 16
3  Y 2 18
3  Y 3 19
3  Y 4 10

I Wanted to compare these values with each other so if an ID has changed its value of A variable over a period of B variable(which is from 1 to 4) it goes into data frame K and if it hasn't then it goes to data frame L.

so in this data set K will look like

ID A B C 
1  X 1 10
1  X 2 10
1  Z 3 15
1  Y 4 12
2  Y 1 15
2  X 2 13
2  X 3 13
2  Y 4 13

and L will look like

ID A B C 
3  Y 1 16
3  Y 2 18
3  Y 3 19
3  Y 4 10

In terms of nested loops and if then else statement it can be solved like following

for ( i in 1:length(ID)){
m=0
for (j in 1: length(B)){
ifelse( A[j] == A[j+1],m,m=m+1)
}
ifelse(m=0,  L=c[,df[i]], K=c[,df[i]])
}

I have read in some posts that in R nested loops can be replaced by apply and outer function. if someone can help me understand how it can be used in such circumstances.

like image 967
Jay khan Avatar asked Jan 08 '23 18:01

Jay khan


2 Answers

So basically you don't need a loop with conditions here, all you need to do is to check if there's a variance (and then converting it to a logical using !) in A during each cycle of B (IDs) by converting A to a numeric value (I'm assuming its a factor in your real data set, if its not a factor, you can use FUN = function(x) length(unique(x)) within ave instead ) and then split accordingly. With base R we can use ave for such task, for example

indx <- !with(df, ave(as.numeric(A), ID , FUN = var))

Or (if A is a character rather a factor)

indx <- with(df, ave(A, ID , FUN = function(x) length(unique(x)))) == 1L

Then simply run split

split(df, indx)
# $`FALSE`
# ID A B  C
# 1  1 X 1 10
# 2  1 X 2 10
# 3  1 Z 3 15
# 4  1 Y 4 12
# 5  2 Y 1 15
# 6  2 X 2 13
# 7  2 X 3 13
# 8  2 Y 4 13
# 
# $`TRUE`
# ID A B  C
# 9   3 Y 1 16
# 10  3 Y 2 18
# 11  3 Y 3 19
# 12  3 Y 4 10

This will return a list with two data frames.


Similarly with data.table

library(data.table)
setDT(df)[, indx := !var(A), by = ID]
split(df, df$indx)

Or dplyr

library(dplyr)
df %>%
  group_by(ID) %>%
  mutate(indx = !var(A)) %>%
  split(., indx)
like image 171
David Arenburg Avatar answered Jan 15 '23 15:01

David Arenburg


Since you want to understand apply rather than simply getting it done, you can consider tapply. As a demonstration:

> tapply(df$A, df$ID, function(x) ifelse(length(unique(x))>1, "K", "L"))
  1   2   3 
"K" "K" "L" 

In a bit plainer English: go through all df$A grouped by df$ID, and apply the function on df$A within each groupings (i.e. the x in the embedded function): if the number of unique values is more than 1, it's "K", otherwise it's "L".

like image 33
Ricky Avatar answered Jan 15 '23 14:01

Ricky