Add a new level to a factor and substitute existing one

Question

I'm having a big trouble on dealing with levels names of a data frame.

I have a big data frame in which one of the colums is a factor with a LOT of levels.

The problem is that some of this data are duplicated and the next step in my analysis do not accept duplicated data. So I need to change the name of the duplicated level so I can move on to my next step.

Let me give you a little example:

Say we have this simple data frame with one colum:

> df
col_foo
1   bar1
2   bar2
3   bar3
4   bar2
5   bar4
6   bar5
7   bar3

If we look at the column, we see that it is a factor with 5 distinct levels.

>df$col_foo
[1] bar1 bar2 bar3 bar2 bar4 bar5 bar3
Levels: bar1 bar2 bar3 bar4 bar5

Ok, the problem comes now. See that levels bar2 and bar3 are duplicated. What I want to know is how can I add a level name, something like bar2_X and substitute only the duplicated one for this. So the dataframe should become this:

> df
col_foo
1   bar1
2   bar2
3   bar3
4   bar2_X
5   bar4
6   bar5
7   bar3_X

Is that possible ? I cannot change the class of the column, it should still be a factor, so solutions that need to change it will not solve my problem unless it is possible to coerce to factor again.

Thanks

Richie Cotton · Accepted Answer

Call make.names with unique = TRUE on your column.

df$col_foo <- factor(make.names(df$col_foo, unique = TRUE))

Add a new level to a factor and substitute existing one

Tags:

r

Lianzinho

1 Answers

Richie Cotton

Recent Activity

Donate For Us

Add a new level to a factor and substitute existing one

Tags:

r

Lianzinho

1 Answers

Richie Cotton

Related questions

Recent Activity

Donate For Us