Excluding columns from a dataframe based on column sums

Tags:

I'm working on a data set that includes community data, and many of the columns (species) have a lot of zeroes. I would like to be able to drop these columns for some of the analyses I'm doing, based on the sum of the whole column. I'm tempted to do this with a for loop, but I hear that the apply and by functions are better when you're using R. My goal is to remove all columns with a sum of less than 15. I have used which() to remove rows by factors, e.g.,

September<-which(data$Time_point=="September")

data<-data[-September,]

and the two ways I've tried removing columns is by using apply():

data<-data[,apply(data,2,function(x)sum(x<=15))]

and by using a messy for loop/if else combo:

for (i in 6:length(data)){
    if (sum(data[,i])<=15)
    data[,i]<-NULL
    else 
    data[,i]<-data[,i]
    }

Neither of these methods has been working. Surely there is an elegant way to get rid of columns based on logical criteria?

str(head(data,10))
'data.frame':   10 obs. of  23 variables:
 $ Core_num    : Factor w/ 159 levels "152","153","154",..: 133 72 70 75 89 85 86 90 95 99
 $ Cage_num    : num  0 1 2 3 4 5 6 7 8 9
 $ Treatment   : Factor w/ 4 levels "","C","CC","NC": 1 2 2 2 2 2 2 2 2 2
 $ Site        : Factor w/ 10 levels "","B","B07","B08",..: 1 8 8 8 7 7 7 7 9 9
 $ Time_point  : Factor w/ 3 levels "","May","September": 1 2 2 2 2 2 2 2 2 2
 $ Spionidae   : num  108 0 0 0 0 0 0 0 0 0
 $ Syllidae    : num  185 0 0 0 3 8 0 1 4 1
 $ Opheliidae  : num  424 0 1 0 0 0 1 1 0 0
 $ Cossuridae  : num  164 0 7 3 0 0 0 0 0 0
 $ Sternaspidae: num  214 0 0 6 1 0 11 9 0 0
 $ Sabellidae  : num  1154 0 2 2 0 ...
 $ Capitellidae: num  256 1 10 17 0 3 0 0 0 0
 $ Dorvillidae : num  21 1 0 0 0 0 0 0 0 0
 $ Cirratulidae: num  17 0 0 0 0 0 0 0 0 0
 $ Oligochaeta : num  3747 12 41 27 32 ...
 $ Nematoda    : num  410 5 4 13 0 0 0 2 2 0
 $ Sipuncula   : num  33 0 0 0 0 0 0 0 0 0
 $ Ostracoda   : num  335 0 1 0 0 0 0 0 0 0
 $ Decapoda    : num  62 0 4 0 1 0 0 0 0 0
 $ Amphipoda   : num  2789 75 17 34 89 ...
 $ Copepoda    : num  75 0 0 0 0 0 0 0 0 0
 $ Tanaidacea  : num  84 0 0 0 1 0 0 0 0 0
 $ Mollusca    : int  55 0 4 0 0 0 0 0 0 0

412

asked May 15 '12 20:05

Margaret

1 Answers

What about a simple subset? First, we create a simple data frameL

R> dd = data.frame(x = runif(5), y = 20*runif(5), z=20*runif(5))

Then select the columns where the sum is greater than 15

R> dd1 = dd[,colSums(dd) > 15]
R> ncol(dd1)
[1] 2

In your data set, you only want to subset columns 6 onwards, so something like:

 ##Drop the first five columns
 dd[,colSums(dd[,6:ncol(dd)]) > 15]

 #Keep the first six columns
 cols_to_drop = c(rep(TRUE, 5), dd[,6:ncol(dd)]>15)
 dd[,cols_to_drop]

should work.

The key part to note is that in the square brackets, we want a vector of logicals, i.e. a vector of TRUE and FALSE. So if you wanted to subset using something a bit more complicated, then create a function that returns TRUE or FALSE and subset as usual.

answered Sep 18 '22 17:09

csgillespie

Related questions
                            
                                mongodb: upserting: only set value if document is being inserted
                            
                                Count verbs, nouns, and other parts of speech with python's NLTK
                            
                                The cast to value type 'Decimal' failed because the materialized value is null
                            
                                Is spring getbean case sentitive or not?
                            
                                What is technical difference between SubmitChanges in Linq-to-SQL and SaveChanges in Entity Framework?
                            
                                Kendo template conditional formatting
                            
                                SQL server 2012 SP_HELPTEXT extra lines issue
                            
                                Disable Dictation button on the keyboard of iPhone 4S / new iPad
                            
                                ambiguous class with namespace names in 2 dlls
                            
                                Saving an image form clj-http request to file
                            
                                How can I send an HTTP PUT request in Ruby?
                            
                                Looking for example using MediaFileUpload

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With