How can I extract the column names (or row and column index) of duplicate element in next data frame? <pre class="prettyprint"><code> V1 V2 V3 V4 PC1 0.5863431 0.5863431 3.952237e-01 3.952237e-01 PC2 -0.3952237 -0.3952237 5.863431e-01 5.863431e-01 PC3 -0.7071068 0.7071068 1.665335e-16 3.885781e-16 </code></pre> For example <code>0.5863431</code> is equal to <code>0.5863431</code>, so <code>"V1"</code> and <code>"V2"</code> are the column names. In that dataframe I want to get: <pre class="prettyprint"><code>[1] "V1" "V2" "V3" "V4" </code></pre> As you can see, looking rather only the result of the first row. Second example: <pre class="prettyprint"><code> V1 V2 V3 V4 PC1 -0.5987139 -0.5987139 -0.03790446 0.5307039 PC2 -0.0189601 -0.0189601 -0.99315168 -0.1137136 PC3 0.3986891 0.3523926 -0.11045319 0.8394442 </code></pre> Result: <pre class="prettyprint"><code>[1] "V1" "V2" </code></pre>

There may be a better way, but here's my take on it. <pre class="prettyprint"><code>## coerce to matrix (if not already) m <- as.matrix(df) ## find duplicates across both margins d <- duplicated(m, MARGIN = 0) | duplicated(m, MARGIN = 0, fromLast = TRUE) ## grab the unique col names colnames(m)[unique(col(d)[d])] </code></pre> Examples: On your first data frame - <pre class="prettyprint"><code>df1 <- read.table(text = "V1 V2 V3 V4 PC1 0.5863431 0.5863431 3.952237e-01 3.952237e-01 PC2 -0.3952237 -0.3952237 5.863431e-01 5.863431e-01 PC3 -0.7071068 0.7071068 1.665335e-16 3.885781e-16", header = TRUE) m1 <- as.matrix(df1) d1 <- duplicated(m1, MARGIN = 0) | duplicated(m1, MARGIN = 0, fromLast = TRUE) colnames(m1)[unique(col(d1)[d1])] # [1] "V1" "V2" "V3" "V4" </code></pre> And on the second - <pre class="prettyprint"><code>df2 <- read.table(text = "V1 V2 V3 V4 PC1 -0.5987139 -0.5987139 -0.03790446 0.5307039 PC2 -0.0189601 -0.0189601 -0.99315168 -0.1137136 PC3 0.3986891 0.3523926 -0.11045319 0.8394442", header = TRUE) m2 <- as.matrix(df2) d2 <- duplicated(m2, MARGIN = 0) | duplicated(m2, MARGIN = 0, fromLast = TRUE) colnames(m2)[unique(col(d2)[d2])] # [1] "V1" "V2" </code></pre> Side note: Since your data contains all numeric values I would recommend beginning with a matrix instead of a data frame.

Index of non-unique element in data frame

Tags:

r

How can I extract the column names (or row and column index) of duplicate element in next data frame?

            V1         V2           V3           V4
PC1  0.5863431  0.5863431 3.952237e-01 3.952237e-01
PC2 -0.3952237 -0.3952237 5.863431e-01 5.863431e-01
PC3 -0.7071068  0.7071068 1.665335e-16 3.885781e-16

For example 0.5863431 is equal to 0.5863431, so "V1" and "V2" are the column names.

In that dataframe I want to get:

[1] "V1" "V2" "V3" "V4"

As you can see, looking rather only the result of the first row.

Second example:

            V1         V2          V3         V4
PC1 -0.5987139 -0.5987139 -0.03790446  0.5307039
PC2 -0.0189601 -0.0189601 -0.99315168 -0.1137136
PC3  0.3986891  0.3523926 -0.11045319  0.8394442

Result:

[1] "V1" "V2"

514

asked May 22 '16 17:05

Султан Гашимов

1 Answers

There may be a better way, but here's my take on it.

## coerce to matrix (if not already)
m <- as.matrix(df)
## find duplicates across both margins
d <- duplicated(m, MARGIN = 0) | duplicated(m, MARGIN = 0, fromLast = TRUE)
## grab the unique col names
colnames(m)[unique(col(d)[d])]

Examples: On your first data frame -

df1 <- read.table(text = "V1         V2           V3           V4
PC1  0.5863431  0.5863431 3.952237e-01 3.952237e-01
PC2 -0.3952237 -0.3952237 5.863431e-01 5.863431e-01
PC3 -0.7071068  0.7071068 1.665335e-16 3.885781e-16", header = TRUE)

m1 <- as.matrix(df1)
d1 <- duplicated(m1, MARGIN = 0) | duplicated(m1, MARGIN = 0, fromLast = TRUE)
colnames(m1)[unique(col(d1)[d1])]
# [1] "V1" "V2" "V3" "V4"

And on the second -

df2 <- read.table(text = "V1         V2          V3         V4
PC1 -0.5987139 -0.5987139 -0.03790446  0.5307039
PC2 -0.0189601 -0.0189601 -0.99315168 -0.1137136
PC3  0.3986891  0.3523926 -0.11045319  0.8394442", header = TRUE)

m2 <- as.matrix(df2)
d2 <- duplicated(m2, MARGIN = 0) | duplicated(m2, MARGIN = 0, fromLast = TRUE)
colnames(m2)[unique(col(d2)[d2])]
# [1] "V1" "V2"

Side note: Since your data contains all numeric values I would recommend beginning with a matrix instead of a data frame.

answered Sep 28 '22 18:09

Rich Scriven

Related questions
                            
                                How to find next particular day?
                            
                                How to change the line thickness of whiskers using stat_boxplot(geom = "errorbar")
                            
                                Sort data frame by two columns (with condition) [duplicate]
                            
                                R find last weekday of month
                            
                                Visualizing hierarchical data with circle packing in ggplot2?
                            
                                Integrate plotly with shinydashboard
                            
                                Import txt file in R ignoring first few lines
                            
                                data.table replace NA with mean for multiple columns and by id
                            
                                String split on a number word pattern
                            
                                How to match 2 dataframe columns and extract column values and column names?
                            
                                ggplot: Subset a layer where data is passed using a pipe
                            
                                Specify colors for each link in a force directed network, networkD3::forceNetwork()
                            
                                Reactive Function Parameters
                            
                                Error in predict() glmnet function: not-yet-implemented method
                            
                                Pass arguments in nested function to update default arguments
                            
                                R Shiny img() on UI side does not render the image
                            
                                Sentimental Analysis of review comments using qdap is slow
                            
                                How to balance unbalanced classification 1:1 with SMOTE in R
                            
                                see memory usage of the computer vs of memory usage of R in Rstudio?
                            
                                How to convert a list() to an ellipsis in R?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With