Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Make all elements unique in a dataframe

Tags:

r

Assuming I have data looks like below.

At this entire data, in total I have 3*A, 2*B, 2*C, and only 1 D, E, and F.

data <- read.table(textConnection("
col1 col2 
A B
A C
B A
C D
E F
"), header = TRUE)

What I want to do is to keep the order and contents the same, BUT make them unique. For example, the A becomes A.1, A.2, and A.3.

col1 col2 
A.1 B.2
A.2 C.2
B.1 A.3
C.1 D
E F

Is there any smart way I can do this?

I know I can use make.unique or make.names, but it looks like it only can work for one column, not for entire dataset.

like image 834
Sakura Avatar asked Jul 12 '17 08:07

Sakura


2 Answers

Using:

dat[] <- make.unique(as.character(unlist(dat)))

gives:

> dat
  col1 col2
1    A  B.1
2  A.1  C.1
3    B  A.2
4    C    D
5    E    F
like image 131
Jaap Avatar answered Oct 27 '22 20:10

Jaap


The OP requires that the values in the data.frame should be made unique across all columns. This is a strong indicator that the data better should be reshaped from wide to long format where all data manipulations can be performed on one column instead of many.

library(data.table)
DT <- data.table(data)
molten <- melt(DT, measure.vars = names(DT))[
  , value := paste(value, rowid(value), sep = ".")]
molten
    variable value
 1:     col1   A.1
 2:     col1   A.2
 3:     col1   B.1
 4:     col1   C.1
 5:     col1   E.1
 6:     col2   B.2
 7:     col2   C.2
 8:     col2   A.3
 9:     col2   D.1
10:     col2   F.1

The rowid() function is a convenience function for generating a unique row id within each group.

Further processing can continue in the long format. Finally, the data may be reshaped to wide format again:

molten[, rn := rowid(variable)][, dcast(.SD, rn ~ variable)][, rn := NULL][]
   col1 col2
1:  A.1  B.2
2:  A.2  C.2
3:  B.1  A.3
4:  C.1  D.1
5:  E.1  F.1

Jaap's make.unique() approach can be used as well:

melt(DT, measure.vars = names(DT))[, value := make.unique(value)][]
    variable value
 1:     col1     A
 2:     col1   A.1
 3:     col1     B
 4:     col1     C
 5:     col1     E
 6:     col2   B.1
 7:     col2   C.1
 8:     col2   A.2
 9:     col2     D
10:     col2     F
like image 30
Uwe Avatar answered Oct 27 '22 21:10

Uwe