In short:
I want to do this with my table,
Explanation:
I have big table with 20,000 x 1,200 items. I want to remove all the columns which have all the values same from top to bottom. But it shouldn't change the variable name(V2 in the example) so that later I can figure out which one them is removed.
The most easiest way to drop columns is by using subset() function. In the code below, we are telling R to drop variables x and z. The '-' sign indicates dropping variables. Make sure the variable names would NOT be specified in quotes when using subset() function.
To remove a character in an R data frame column, we can use gsub function which will replace the character with blank. For example, if we have a data frame called df that contains a character column say x which has a character ID in each value then it can be removed by using the command gsub("ID","",as.
We can delete multiple columns in the R dataframe by assigning null values through the list() function.
To remove rows with an in R we can use the na. omit() and <code>drop_na()</code> (tidyr) functions.
Just use vapply
to go through and check how many unique values there are in each column:
Sample data:
mydf <- data.frame(v1 = 1:4, v2 = 5:8,
v3 = 2, v4 = 9:12, v5 = 1)
mydf
## v1 v2 v3 v4 v5
## 1 1 5 2 9 1
## 2 2 6 2 10 1
## 3 3 7 2 11 1
## 4 4 8 2 12 1
What we will be doing with vapply
:
vapply(mydf, function(x) length(unique(x)) > 1, logical(1L))
# v1 v2 v3 v4 v5
# TRUE TRUE FALSE TRUE FALSE
Keep the columns you want:
mydf[vapply(mydf, function(x) length(unique(x)) > 1, logical(1L))]
# v1 v2 v4
# 1 1 5 9
# 2 2 6 10
# 3 3 7 11
# 4 4 8 12
In case someone tries to do this with dplyr
, this yet another way to do it:
library(dplyr)
mydf %>% select(where(~n_distinct(.) > 1))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With