Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

replace factor values with next higher factor levels

Tags:

r

factors

I have a question on factors in R. I want to replace the values of a factor with the next higher factor levels. Here's an example:

Suppose I have the factor have:

set.seed(1)

have <- sample(1:20, 10, TRUE)
have
#  [1]  4  7  1  2 11 14 18 19  1 10

What I would like to get is this

[1] 7    10   2    4    14   18   19   <NA> 2    11

hence, each value is replaced with the next highest factor value / level (4 becomes 7, 7 becomes 10 etc), and the highest value is replaced with a NA.

One way to achieve this would be

want <- factor(have)
levels(want) <- c(levels(want)[-1], NA)
want
# [1] 7    10   2    4    14   18   19   <NA> 2    11  
# Levels: 2 4 7 10 11 14 18 19

Is there another way to do this?

I have received three very good answers that I'll try to summarize here:

func_lookup <- function(x){
  lu <- sort(unique(x))
  lu <- "[<-"(NA, lu, c(lu[-1], NA))
  lu[x]
}

func_dplyr <- function(x){
  levels(x) <- dplyr::lead(levels(x))
  x
}

func_base <- function(x){
  vals <- sort(unique(x))
  vals[match(x, vals) + 1]
}

As can be seen from the examples, func_lookup only works for vectors, while func_dplyr only works for factors. func_base works with both factors and vectors.

# Example 1
set.seed(1)
# create sample data
have <- c(4, 6, 6, 7)

# create sample data as factor
have_f <- factor(have)

# test functions for factor
have_f
func_lookup(have_f)
func_dplyr(have_f)
func_base(have_f)


#> have_f
#[1] 4 6 6 7
#Levels: 4 6 7
#> func_lookup(have_f)
#[1]  2  3  3 NA
#> func_dplyr(have_f)
#[1] 6    7    7    <NA>
#Levels: 6 7
#> func_base(have_f)
#[1] 6    7    7    <NA>
#Levels: 4 6 7

# for vectors
func_lookup(have)
func_base(have)

> func_lookup(have)
[1]  6  7  7 NA
> #func_dplyr(have)
> func_base(have)
[1]  6  7  7 NA

like image 894
A.Fischer Avatar asked Dec 31 '22 15:12

A.Fischer


1 Answers

sort the unique values of have, use match to get their index position, add + 1 to get the next value and subset it.

vals <- sort(unique(have))
vals[match(have, vals) + 1]
#[1]  7 10  2  4 14 18 19 NA  2 11
like image 103
Ronak Shah Avatar answered Jan 31 '23 22:01

Ronak Shah