Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete rows based on values in the columns and a thresh hold value

I have a table, the start is below:

                       SM_H1455     SM_H1456  SM_H1457   SM_H1461     SM_H1462     SM_H1463 
ENSG00000001617.7         0            0          0           0          0           0                              
ENSG00000001626.9         0            0          0           0          0           0                                                            
ENSG00000002587.5         10           0          6           2          0           2                                               
ENSG00000002726.15        8            14         0           2          16          2                                                                
ENSG00000002745.8         6            2          2           0          0           4                                 

I want to delete rows in which >= 80% of columns have the value 0. So I have 6 cols here, if 5 or more of the columns in a row have a 0, then that row needs to be deleted.

I currently have this code:

data = data[!rowSums(data == 0), ]

But this code delete all the rows as long as they have a 0, without taking into account the 80% thresh hold.

like image 618
zfz Avatar asked Dec 09 '25 10:12

zfz


1 Answers

I think that @Hong Ooi's answer is incorrect in this case. This will give you the result that you have asked for:

data <- data[rowSums(data==0)/ncol(data) < 0.8, ]

data==0 returns a data frame filled with TRUE if the value at that location is equal zero, otherwise FALSE. Numerically, R treats TRUEas having a value of 1 and FALSE as having a value of zero.

rowSums adds up the numerical equivalents of the TRUE and FALSE values for each row in the data frame returned from data==0. rowSums(data==0) basically gives the number of elements in each row in data which are zero.

ncol is the number of columns in the original data object.

rowSums(data==0)/ncol(data) is therefore the proportion of elements equal to zero in each row.

Finally, we can discard the rows where the above proprtion are not less than 80% by filtering (using [] notation).

UPDATE: @Hong Ooi's edit means that their answer is also correct now.

like image 156
CnrL Avatar answered Dec 12 '25 01:12

CnrL



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!