Let's say I have a data frame like this:
df <- data.frame(a=letters[1:26],1:26)
And I would like to "re" factor a, b, and c as "a".
How do I do that?
One option is the recode()
function in package car
:
require(car)
df <- data.frame(a=letters[1:26],1:26)
df2 <- within(df, a <- recode(a, 'c("a","b","c")="a"'))
> head(df2)
a X1.26
1 a 1
2 a 2
3 a 3
4 d 4
5 e 5
6 f 6
Example where a
is not so simple and we recode several levels into one.
set.seed(123)
df3 <- data.frame(a = sample(letters[1:5], 100, replace = TRUE),
b = 1:100)
with(df3, head(a))
with(df3, table(a))
the last lines giving:
> with(df3, head(a))
[1] b d c e e a
Levels: a b c d e
> with(df3, table(a))
a
a b c d e
19 20 21 22 18
Now lets combine levels a
and e
into level Z
using recode()
df4 <- within(df3, a <- recode(a, 'c("a","e")="Z"'))
with(df4, head(a))
with(df4, table(a))
which gives:
> with(df4, head(a))
[1] b d c Z Z Z
Levels: b c d Z
> with(df4, table(a))
a
b c d Z
20 21 22 37
Doing this without spelling out the levels to merge:
## Select the levels you want (here 'a' and 'e')
lev.want <- with(df3, levels(a)[c(1,5)])
## now paste together
lev.want <- paste(lev.want, collapse = "','")
## then bolt on the extra bit
codes <- paste("c('", lev.want, "')='Z'", sep = "")
## then use within recode()
df5 <- within(df3, a <- recode(a, codes))
with(df5, table(a))
Which gives us the same as df4
above:
> with(df5, table(a))
a
b c d Z
20 21 22 37
Has anyone tried using this simple method? It requires no special packages, just an understanding of how R treats factors.
Say you want to rename the levels in a factor, get their indices
data <- data.frame(a=letters[1:26],1:26)
lalpha <- levels(data$a)
In this example we imagine we want to know the index for the level 'e' and 'w'
lalpha <- levels(data$a)
ind <- c(which(lalpha == 'e'), which(lalpha == 'w'))
Now we can use this index to replace the levels of the factor 'a'
levels(data$a)[ind] <- 'X'
If you now look at the dataframe factor a
there will be an X where there was an e
and w
I leave it to you to try the result.
You could do something like:
df$a[df$a %in% c("a","b","c")] <- "a"
UPDATE: More complicated factors.
Data <- data.frame(a=sample(c("Less than $50,000","$50,000-$99,999",
"$100,000-$249,999", "$250,000-$500,000"),20,TRUE),n=1:20)
rows <- Data$a %in% c("$50,000-$99,999", "$100,000-$249,999")
Data$a[rows] <- "$250,000-$500,000"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With