I have a dataframe where a column has duplicate values like
employee <- data.frame(name = c('John', 'Joe', 'Mat', 'John', 'Joe'),
salary = c(1500, 2000, 1700, 1210, 2100),
startdate = c('2012-05-10', '2015-02-17',
'2014-09-11', '2011-11-23', '2010-10-27'))
I can get the unique elements in column 1 by
unique(employee$name)
However, I want to make each items in the name
column unique. If something appears second time append _1 to it. If it appears again append _2 to it. So, in the employee dataframe, I want to change the second column to
John
Joe
Mat
John_1
Joe_1
Is there a way except looping over it?
We can use make.names
with unique=TRUE
. By default, a .
will be appended before the suffix numbers, and that can be replaced by _
using sub
employee$name <- sub('[.]', '_', make.names(employee$name, unique=TRUE))
Or a better option suggested by @DavidArenburg. If the name
column is factor
class, convert the input column to character
class (as.character
) before applying the make.unique
make.unique(as.character(employee$name), sep = "_")
#[1] "John" "Joe" "Mat" "John_1" "Joe_1"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With