Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New subset by selecting rows based on values of a vector in R

Tags:

r

I have a data set U1 over which I run a classifier and get a vector of labels

pred.U1.nb.c <- predict(NB.C, U1[,2:6])
table(pred.U1.nb.c)
pred.U1.nb.c
    S unlabeled 
  148      5852 
> head(pred.U1.nb.c)
  [1] S S S S S S
  Levels: S unlabeled

Now I want to pull out those rows of U1 which were classified as S in U1.S. What is the most efficient way to do this?

like image 601
Tathagata Avatar asked Dec 10 '22 12:12

Tathagata


2 Answers

The answer by James has elegant economy going for it and would certainly work correctly with this example, but it is prone to undesirable results if the tested vector has any NA's. (I have been bitten many times and been puzzled.) Here are two safer ways that avoid the NA -inclusive behavior of the "[" function:

U1[which(pred.U1.nb.c=="S"), ]

This converts the logical vector (possibly with NA's) into a numerical vector with no NA's. Can also use subset:

subset(U1 ,pred.U1.nb.c=="S")

EDIT: I suspect that using grepl would also avoid the NA concern. Perhaps:

U1[grepl("^S$", pred.U1.nb.c), ]
like image 196
IRTFM Avatar answered Dec 12 '22 01:12

IRTFM


U1[pred.U1.nb.c=="S",]
like image 29
James Avatar answered Dec 12 '22 02:12

James