Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R Recode variable for all observations that do not occur more than once

Tags:

r

recode

I have a simple dataframe that looks like the following:

Observation X1 X2 Group
1           2   4   1
2           6   3   2
3           8   4   2
4           1   3   3
5           2   8   4
6           7   5   5
7           2   4   5

How can I recode the group variable such that all non-recurrent observations are recoded as "unaffiliated"?

The desired output would be the following:

Observation X1 X2 Group
1           2   4   Unaffiliated
2           6   3   2
3           8   4   2
4           1   3   Unaffiliated
5           2   8   Unaffiliated
6           7   5   5
7           2   4   5

like image 547
flâneur Avatar asked Oct 12 '25 09:10

flâneur


2 Answers

We may use duplicated to create a logical vector for non-duplicates and assign the 'Group' to Unaffiliated for those non-duplicates

df1$Group[with(df1, !(duplicated(Group)|duplicated(Group, 
     fromLast = TRUE)))] <- "Unaffiliated"

-output

> df1
  Observation X1 X2        Group
1           1  2  4 Unaffiliated
2           2  6  3            2
3           3  8  4            2
4           4  1  3 Unaffiliated
5           5  2  8 Unaffiliated
6           6  7  5            5
7           7  2  4            5

data

df1 <- structure(list(Observation = 1:7, X1 = c(2L, 6L, 8L, 1L, 2L, 
7L, 2L), X2 = c(4L, 3L, 4L, 3L, 8L, 5L, 4L), Group = c(1L, 2L, 
2L, 3L, 4L, 5L, 5L)), class = "data.frame", row.names = c(NA, 
-7L))
like image 200
akrun Avatar answered Oct 13 '25 22:10

akrun


unfaffil takes a vector of Group numbers and returns "Unaffiliated" if it has one element and otherwise returns the input. We can then apply it by Group using ave. This does not overwrite the input. No packages are used but if you use dplyr then transform can be replaced with mutate.

unaffil <- function(x) if (length(x) == 1) "Unaffiliated" else x
transform(dat, Group = ave(Group, Group, FUN = unaffil))

giving

  Observation X1 X2        Group
1           1  2  4 Unaffiliated
2           2  6  3            2
3           3  8  4            2
4           4  1  3 Unaffiliated
5           5  2  8 Unaffiliated
6           6  7  5            5
7           7  2  4            5

Note

dat <- structure(list(Observation = 1:7, X1 = c(2L, 6L, 8L, 1L, 2L, 
7L, 2L), X2 = c(4L, 3L, 4L, 3L, 8L, 5L, 4L), Group = c(1L, 2L, 
2L, 3L, 4L, 5L, 5L)), class = "data.frame", row.names = c(NA, 
-7L))
like image 39
G. Grothendieck Avatar answered Oct 13 '25 22:10

G. Grothendieck