Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Index of non-unique element in data frame

Tags:

r

How can I extract the column names (or row and column index) of duplicate element in next data frame?

            V1         V2           V3           V4
PC1  0.5863431  0.5863431 3.952237e-01 3.952237e-01
PC2 -0.3952237 -0.3952237 5.863431e-01 5.863431e-01
PC3 -0.7071068  0.7071068 1.665335e-16 3.885781e-16

For example 0.5863431 is equal to 0.5863431, so "V1" and "V2" are the column names.

In that dataframe I want to get:

[1] "V1" "V2" "V3" "V4"

As you can see, looking rather only the result of the first row.

Second example:

            V1         V2          V3         V4
PC1 -0.5987139 -0.5987139 -0.03790446  0.5307039
PC2 -0.0189601 -0.0189601 -0.99315168 -0.1137136
PC3  0.3986891  0.3523926 -0.11045319  0.8394442

Result:

[1] "V1" "V2"
like image 514
Султан Гашимов Avatar asked May 22 '16 17:05

Султан Гашимов


People also ask

Can you index a non unique column?

yes, you can create a non unique clustered as well non unique NONCLUSTERED index on temporary table.

Does pandas support non unique index?

From ndarray If data is an ndarray, index must be the same length as data. If no index is passed, one will be created having values [0, ..., len(data) - 1] . pandas supports non-unique index values.

How do you create a non unique index?

To convert the index to non unique: create index temp on mytable (id, 1); drop index myunique; create index mynonunique on mytable (id); drop index temp; Thank you for your answer and your suggestion.

What is unique and non unique index?

In addition to enforcing the uniqueness of data values, a unique index can also be used to improve data retrieval performance during query processing. Non-unique indexes are not used to enforce constraints on the tables with which they are associated.


1 Answers

There may be a better way, but here's my take on it.

## coerce to matrix (if not already)
m <- as.matrix(df)
## find duplicates across both margins
d <- duplicated(m, MARGIN = 0) | duplicated(m, MARGIN = 0, fromLast = TRUE)
## grab the unique col names
colnames(m)[unique(col(d)[d])]

Examples: On your first data frame -

df1 <- read.table(text = "V1         V2           V3           V4
PC1  0.5863431  0.5863431 3.952237e-01 3.952237e-01
PC2 -0.3952237 -0.3952237 5.863431e-01 5.863431e-01
PC3 -0.7071068  0.7071068 1.665335e-16 3.885781e-16", header = TRUE)

m1 <- as.matrix(df1)
d1 <- duplicated(m1, MARGIN = 0) | duplicated(m1, MARGIN = 0, fromLast = TRUE)
colnames(m1)[unique(col(d1)[d1])]
# [1] "V1" "V2" "V3" "V4"

And on the second -

df2 <- read.table(text = "V1         V2          V3         V4
PC1 -0.5987139 -0.5987139 -0.03790446  0.5307039
PC2 -0.0189601 -0.0189601 -0.99315168 -0.1137136
PC3  0.3986891  0.3523926 -0.11045319  0.8394442", header = TRUE)

m2 <- as.matrix(df2)
d2 <- duplicated(m2, MARGIN = 0) | duplicated(m2, MARGIN = 0, fromLast = TRUE)
colnames(m2)[unique(col(d2)[d2])]
# [1] "V1" "V2"

Side note: Since your data contains all numeric values I would recommend beginning with a matrix instead of a data frame.

like image 53
Rich Scriven Avatar answered Sep 28 '22 18:09

Rich Scriven