I'm having a big trouble on dealing with levels names of a data frame.
I have a big data frame in which one of the colums is a factor with a LOT of levels.
The problem is that some of this data are duplicated and the next step in my analysis do not accept duplicated data. So I need to change the name of the duplicated level so I can move on to my next step.
Let me give you a little example:
Say we have this simple data frame with one colum:
> df
col_foo
1 bar1
2 bar2
3 bar3
4 bar2
5 bar4
6 bar5
7 bar3
If we look at the column, we see that it is a factor with 5 distinct levels.
>df$col_foo
[1] bar1 bar2 bar3 bar2 bar4 bar5 bar3
Levels: bar1 bar2 bar3 bar4 bar5
Ok, the problem comes now. See that levels bar2
and bar3
are duplicated. What I want to know is how can I add a level name, something like bar2_X
and substitute only the duplicated one for this. So the dataframe should become this:
> df
col_foo
1 bar1
2 bar2
3 bar3
4 bar2_X
5 bar4
6 bar5
7 bar3_X
Is that possible ? I cannot change the class of the column, it should still be a factor, so solutions that need to change it will not solve my problem unless it is possible to coerce to factor again.
Thanks
Call make.names
with unique = TRUE
on your column.
df$col_foo <- factor(make.names(df$col_foo, unique = TRUE))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With