Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove columns with same value in R

Tags:

r

In short:

I want to do this with my table,

enter image description here

Explanation:

I have big table with 20,000 x 1,200 items. I want to remove all the columns which have all the values same from top to bottom. But it shouldn't change the variable name(V2 in the example) so that later I can figure out which one them is removed.

like image 538
Dev Avatar asked May 30 '15 09:05

Dev


People also ask

How do I eliminate columns in R?

The most easiest way to drop columns is by using subset() function. In the code below, we are telling R to drop variables x and z. The '-' sign indicates dropping variables. Make sure the variable names would NOT be specified in quotes when using subset() function.

How do I remove a specific value from a column in R?

To remove a character in an R data frame column, we can use gsub function which will replace the character with blank. For example, if we have a data frame called df that contains a character column say x which has a character ID in each value then it can be removed by using the command gsub("ID","",as.

How do I remove multiple columns in R?

We can delete multiple columns in the R dataframe by assigning null values through the list() function.

How do I remove certain values in R?

To remove rows with an in R we can use the na. omit() and <code>drop_na()</code> (tidyr) functions.


2 Answers

Just use vapply to go through and check how many unique values there are in each column:

Sample data:

mydf <- data.frame(v1 = 1:4, v2 = 5:8,
                   v3 = 2, v4 = 9:12, v5 = 1)
mydf
##   v1 v2 v3 v4 v5
## 1  1  5  2  9  1
## 2  2  6  2 10  1
## 3  3  7  2 11  1
## 4  4  8  2 12  1

What we will be doing with vapply:

vapply(mydf, function(x) length(unique(x)) > 1, logical(1L))
#    v1    v2    v3    v4    v5 
#  TRUE  TRUE FALSE  TRUE FALSE 

Keep the columns you want:

mydf[vapply(mydf, function(x) length(unique(x)) > 1, logical(1L))]
#   v1 v2 v4
# 1  1  5  9
# 2  2  6 10
# 3  3  7 11
# 4  4  8 12
like image 197
A5C1D2H2I1M1N2O1R2T1 Avatar answered Sep 30 '22 21:09

A5C1D2H2I1M1N2O1R2T1


In case someone tries to do this with dplyr, this yet another way to do it:

library(dplyr)
mydf %>% select(where(~n_distinct(.) > 1))
like image 43
zeehio Avatar answered Sep 30 '22 21:09

zeehio