Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove rows in dataframe with factor ""

Tags:

dataframe

r

I have a dataframe like x where the column genes is a factor. I want to remove all the rows where column genes has nothing. So in table X I want to remove row 4. Is there a way to do this for a large dataframe?

X 
names   values   genes
1 A  0.2876113  EEF1A1 
2 B  0.6681894   GAPDH
3 C  0.1375420 SLC35E2
4 D -1.9063386        
5 E -0.4949905   RPS28

Finally result:

X 
names   values   genes
1 A  0.2876113  EEF1A1 
2 B  0.6681894   GAPDH
3 C  0.1375420 SLC35E2
5 E -0.4949905   RPS28

Thank you all!

like image 643
Lisann Avatar asked Aug 17 '11 08:08

Lisann


2 Answers

It's not completely obvious from your question what the empty values are, but you should be able to adopt the solution below (here I assume the 'empty' values are empty strings):

toBeRemoved<-which(X$genes=="")
X<-X[-toBeRemoved,]
like image 194
Nick Sabbe Avatar answered Oct 12 '22 21:10

Nick Sabbe


@Nick Sabbe provided a great answer, but it has one caveat:

Using -which(...) is a neat trick to (sometimes) speed up the subsetting operation when there are only a few elements to remove.

...But if there are no elements to remove, it fails!

So, if X$genes does not contain any empty strings, which will return an empty integer vector. Negating that is still an empty vector. And X[integer(0)] returns an empty data.frame!

toBeRemoved <- which(X$genes=="")
if (length(toBeRemoved>0)) { # MUST check for 0-length
    X<-X[-toBeRemoved,]
}

Or, if the speed gain isn't important, simply:

X<-X[X$genes!="",]

Or, as @nullglob pointed out,

subset(X, genes != "")
like image 31
Tommy Avatar answered Oct 12 '22 22:10

Tommy