Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add a new level to a factor and substitute existing one

Tags:

r

I'm having a big trouble on dealing with levels names of a data frame.

I have a big data frame in which one of the colums is a factor with a LOT of levels.

The problem is that some of this data are duplicated and the next step in my analysis do not accept duplicated data. So I need to change the name of the duplicated level so I can move on to my next step.

Let me give you a little example:

Say we have this simple data frame with one colum:

> df
col_foo
1   bar1
2   bar2
3   bar3
4   bar2
5   bar4
6   bar5
7   bar3

If we look at the column, we see that it is a factor with 5 distinct levels.

>df$col_foo
[1] bar1 bar2 bar3 bar2 bar4 bar5 bar3
Levels: bar1 bar2 bar3 bar4 bar5

Ok, the problem comes now. See that levels bar2 and bar3 are duplicated. What I want to know is how can I add a level name, something like bar2_X and substitute only the duplicated one for this. So the dataframe should become this:

> df
col_foo
1   bar1
2   bar2
3   bar3
4   bar2_X
5   bar4
6   bar5
7   bar3_X

Is that possible ? I cannot change the class of the column, it should still be a factor, so solutions that need to change it will not solve my problem unless it is possible to coerce to factor again.

Thanks

like image 624
Lianzinho Avatar asked Oct 27 '11 16:10

Lianzinho


1 Answers

Call make.names with unique = TRUE on your column.

df$col_foo <- factor(make.names(df$col_foo, unique = TRUE))
like image 150
Richie Cotton Avatar answered Sep 17 '22 22:09

Richie Cotton