Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing all columns summing to zero with dplyr

Tags:

r

dplyr

I'm currently working on a dataframe that looks something like this:

Site  Spp1  Spp2  Spp3  LOC  TYPE
S01   2     4     0     A    FLOOD
S02   4     0     0     A    REG
....
S10   0     1     0     B    FLOOD
S11   1     0     0     B    REG

What I'm trying to do is subset the dataframe so I can run some indicator species analysis in R.

The following code works in that I create two subsets of the data, merge them into one frame and then drop the unused factor levels

A.flood <- filter(data, TYPE == "FLOOD", LOC == "A")
B.flood <- filter(data, TYPE == "FLOOD", LOC == "B")
A.B.flood <- rbind(A.flood, B.flood) %>% droplevels.data.frame(A.B.flood, except = c("A", "B"))

What I was also hoping/need to do is to drop all Spp columns (in my real dataset there are ~ 60) that sum to zero. Is there a way to achieve this this with dplyr, and if there is, is it possible to pipe that code onto the existing A.B.flood dataframe code?

Thanks!

EDIT

I managed to remove all the columns that summed to zero, by selecting only the columns that summed to > zero:

A.B.flood.subset <- A.B.flood[, apply(A.B.flood[1:(ncol(A.B.flood))], 2, sum)!=0]
like image 534
KaanKaant Avatar asked Dec 03 '15 07:12

KaanKaant


People also ask

How do I remove columns from dplyr in R?

dplyr select() function is used to select the column and by using negation of this to remove columns. All verbs in dplyr package take data.

How do I remove multiple columns in dplyr?

Drop multiple columns by using the column nameWhere, dataframe is the input dataframe and -c(column_names) is the collection of names of the column to be removed.

How do I remove all columns from NA values in R?

library(dplyr) df %>% select_if(~ ! any(is.na(.))) Both methods produce the same result.

How do I remove columns of data in R?

The most easiest way to drop columns is by using subset() function. In the code below, we are telling R to drop variables x and z. The '-' sign indicates dropping variables. Make sure the variable names would NOT be specified in quotes when using subset() function.


2 Answers

For those who want to use dplyr 1.0.0 with the where keyword, you can do:

A.B.flood %>% 
  select(where( ~ is.numeric(.x) && sum(.x) != 0))

returns:

  Spp1 Spp2
1    2    4
2    4    0
3    0    0
4    4    0

using the same data given by @akrun:

A.B.flood <- structure(
  list(
    Site = c("S01", "S02", "S03", "S04"),
    Spp1 = c(2L,
             4L, 0L, 4L),
    Spp2 = c(4L, 0L, 0L, 0L),
    Spp3 = c(0L, 0L, 0L, 0L),
    LOC = c("A", "A", "A", "A"),
    TYPE = c("FLOOD", "REG",
             "FLOOD",
             "REG")
  ),
  .Names = c("Site", "Spp1", "Spp2", "Spp3", "LOC",
             "TYPE"), class = "data.frame", row.names = c(NA, -4L))
like image 134
Agile Bean Avatar answered Sep 27 '22 17:09

Agile Bean


I realize this question is now quite old, but I came accross and found another solution using dplyr's "select" and "which", which might seem clearer to dplyr's enthusiasts:

A.B.flood.subset <- A.B.flood %>% select(which(!colSums(A.B.flood, na.rm=TRUE) %in% 0))
like image 37
Whizz Avatar answered Sep 27 '22 18:09

Whizz