Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grouping/recoding factors in the same data.frame

Tags:

r

Let's say I have a data frame like this:

df <- data.frame(a=letters[1:26],1:26)

And I would like to "re" factor a, b, and c as "a".

How do I do that?

like image 499
Brandon Bertelsen Avatar asked Oct 06 '10 18:10

Brandon Bertelsen


3 Answers

One option is the recode() function in package car:

require(car)
df <- data.frame(a=letters[1:26],1:26)
df2 <- within(df, a <- recode(a, 'c("a","b","c")="a"'))
> head(df2)
  a X1.26
1 a     1
2 a     2
3 a     3
4 d     4
5 e     5
6 f     6

Example where a is not so simple and we recode several levels into one.

set.seed(123)
df3 <- data.frame(a = sample(letters[1:5], 100, replace = TRUE),
                  b = 1:100)
with(df3, head(a))
with(df3, table(a))

the last lines giving:

> with(df3, head(a))
[1] b d c e e a
Levels: a b c d e
> with(df3, table(a))
a
 a  b  c  d  e 
19 20 21 22 18

Now lets combine levels a and e into level Z using recode()

df4 <- within(df3, a <- recode(a, 'c("a","e")="Z"'))
with(df4, head(a))
with(df4, table(a))

which gives:

> with(df4, head(a))
[1] b d c Z Z Z
Levels: b c d Z
> with(df4, table(a))
a
 b  c  d  Z 
20 21 22 37

Doing this without spelling out the levels to merge:

## Select the levels you want (here 'a' and 'e')
lev.want <- with(df3, levels(a)[c(1,5)])
## now paste together
lev.want <- paste(lev.want, collapse = "','")
## then bolt on the extra bit
codes <- paste("c('", lev.want, "')='Z'", sep = "")
## then use within recode()
df5 <- within(df3, a <- recode(a, codes))
with(df5, table(a))

Which gives us the same as df4 above:

> with(df5, table(a))
a
 b  c  d  Z 
20 21 22 37 
like image 167
Gavin Simpson Avatar answered Nov 25 '22 17:11

Gavin Simpson


Has anyone tried using this simple method? It requires no special packages, just an understanding of how R treats factors.

Say you want to rename the levels in a factor, get their indices

data <- data.frame(a=letters[1:26],1:26)
lalpha <- levels(data$a)

In this example we imagine we want to know the index for the level 'e' and 'w'

lalpha <- levels(data$a)
ind <- c(which(lalpha == 'e'), which(lalpha == 'w'))

Now we can use this index to replace the levels of the factor 'a'

levels(data$a)[ind] <- 'X'

If you now look at the dataframe factor a there will be an X where there was an e and w

I leave it to you to try the result.

like image 42
Pancho Mulongeni Avatar answered Nov 25 '22 16:11

Pancho Mulongeni


You could do something like:

df$a[df$a %in% c("a","b","c")] <- "a"

UPDATE: More complicated factors.

Data <- data.frame(a=sample(c("Less than $50,000","$50,000-$99,999",
  "$100,000-$249,999", "$250,000-$500,000"),20,TRUE),n=1:20)
rows <- Data$a %in% c("$50,000-$99,999", "$100,000-$249,999")
Data$a[rows] <- "$250,000-$500,000"
like image 44
Joshua Ulrich Avatar answered Nov 25 '22 17:11

Joshua Ulrich