I'm currently working on a dataframe that looks something like this: <pre class="prettyprint"><code>Site Spp1 Spp2 Spp3 LOC TYPE S01 2 4 0 A FLOOD S02 4 0 0 A REG .... S10 0 1 0 B FLOOD S11 1 0 0 B REG </code></pre> What I'm trying to do is subset the dataframe so I can run some indicator species analysis in R. The following code works in that I create two subsets of the data, merge them into one frame and then drop the unused factor levels <pre class="prettyprint"><code>A.flood <- filter(data, TYPE == "FLOOD", LOC == "A") B.flood <- filter(data, TYPE == "FLOOD", LOC == "B") A.B.flood <- rbind(A.flood, B.flood) %>% droplevels.data.frame(A.B.flood, except = c("A", "B")) </code></pre> What I was also hoping/need to do is to drop all <code>Spp</code> columns (in my real dataset there are ~ 60) that sum to zero. Is there a way to achieve this this with dplyr, and if there is, is it possible to pipe that code onto the existing <code>A.B.flood</code> dataframe code? Thanks! EDIT I managed to remove all the columns that summed to zero, by selecting only the columns that summed to > zero: <pre class="prettyprint"><code>A.B.flood.subset <- A.B.flood[, apply(A.B.flood[1:(ncol(A.B.flood))], 2, sum)!=0] </code></pre>

For those who want to use dplyr 1.0.0 with the <code>where</code> keyword, you can do: <pre class="prettyprint"><code>A.B.flood %>% select(where( ~ is.numeric(.x) && sum(.x) != 0)) </code></pre> returns: <pre class="prettyprint"><code> Spp1 Spp2 1 2 4 2 4 0 3 0 0 4 4 0 </code></pre> using the same data given by @akrun: <pre class="prettyprint"><code>A.B.flood <- structure( list( Site = c("S01", "S02", "S03", "S04"), Spp1 = c(2L, 4L, 0L, 4L), Spp2 = c(4L, 0L, 0L, 0L), Spp3 = c(0L, 0L, 0L, 0L), LOC = c("A", "A", "A", "A"), TYPE = c("FLOOD", "REG", "FLOOD", "REG") ), .Names = c("Site", "Spp1", "Spp2", "Spp3", "LOC", "TYPE"), class = "data.frame", row.names = c(NA, -4L)) </code></pre>

Removing all columns summing to zero with dplyr

Tags:

r

dplyr

I'm currently working on a dataframe that looks something like this:

Site  Spp1  Spp2  Spp3  LOC  TYPE
S01   2     4     0     A    FLOOD
S02   4     0     0     A    REG
....
S10   0     1     0     B    FLOOD
S11   1     0     0     B    REG

What I'm trying to do is subset the dataframe so I can run some indicator species analysis in R.

The following code works in that I create two subsets of the data, merge them into one frame and then drop the unused factor levels

A.flood <- filter(data, TYPE == "FLOOD", LOC == "A")
B.flood <- filter(data, TYPE == "FLOOD", LOC == "B")
A.B.flood <- rbind(A.flood, B.flood) %>% droplevels.data.frame(A.B.flood, except = c("A", "B"))

What I was also hoping/need to do is to drop all Spp columns (in my real dataset there are ~ 60) that sum to zero. Is there a way to achieve this this with dplyr, and if there is, is it possible to pipe that code onto the existing A.B.flood dataframe code?

Thanks!

EDIT

I managed to remove all the columns that summed to zero, by selecting only the columns that summed to > zero:

A.B.flood.subset <- A.B.flood[, apply(A.B.flood[1:(ncol(A.B.flood))], 2, sum)!=0]

534

asked Dec 03 '15 07:12

KaanKaant

2 Answers

For those who want to use dplyr 1.0.0 with the where keyword, you can do:

A.B.flood %>% 
  select(where( ~ is.numeric(.x) && sum(.x) != 0))

returns:

  Spp1 Spp2
1    2    4
2    4    0
3    0    0
4    4    0

using the same data given by @akrun:

A.B.flood <- structure(
  list(
    Site = c("S01", "S02", "S03", "S04"),
    Spp1 = c(2L,
             4L, 0L, 4L),
    Spp2 = c(4L, 0L, 0L, 0L),
    Spp3 = c(0L, 0L, 0L, 0L),
    LOC = c("A", "A", "A", "A"),
    TYPE = c("FLOOD", "REG",
             "FLOOD",
             "REG")
  ),
  .Names = c("Site", "Spp1", "Spp2", "Spp3", "LOC",
             "TYPE"), class = "data.frame", row.names = c(NA, -4L))

134

answered Sep 27 '22 17:09

Agile Bean

I realize this question is now quite old, but I came accross and found another solution using dplyr's "select" and "which", which might seem clearer to dplyr's enthusiasts:

A.B.flood.subset <- A.B.flood %>% select(which(!colSums(A.B.flood, na.rm=TRUE) %in% 0))

answered Sep 27 '22 18:09

Whizz

Related questions
                            
                                Calculate Returns over Period of Time
                            
                                Make a table of string frequency
                            
                                R machine learning packages to deal with factors with a large number of levels
                            
                                In R, how to use regex [:punct:] in gsub?
                            
                                How to create a variable of rownames?
                            
                                Downloading Live Olympic Medal Data into R
                            
                                Speedup conversion of 2 million rows of date strings to POSIX.ct
                            
                                Saving a graph with ggsave after using ggplot_build and ggplot_gtable
                            
                                Complete.obs of cor() function
                            
                                How do I predict new data's cluster after clustering training data?
                            
                                How to change the last value in each group by reference, in data.table
                            
                                clustering very large dataset in R
                            
                                Error while publishing in R pubs
                            
                                CentOS 6.5: Howto install GTK version 2.8.0?
                            
                                Vectorizing loop over vector elements
                            
                                Create lagged variable in unbalanced panel data in R
                            
                                Checking if a variable is a number in R
                            
                                Shade (fill or color) area under density curve by quantile
                            
                                Using write.xlsx to replace an existing sheet with R package xlsx
                            
                                remove a character from the entire data frame

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With