Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove rows of a data frame if id is found in another list in R

Tags:

r

I have a data frame final where each observation has an id in a column called final$workerId I want to remove some rows of this data frame if their ID is found in another list called omit Here is what I've tried

final <- read.csv("the data.csv")
omit <- c("A3E9N7HDRLT8KV","A39HQTITNY9TVJ","A272A0JGRTBFCR","A1QPHQ1C27ZFI7")
final <- final[,-final$workerId %in% omit]

I know how I could do it with a for loop but I am looking for a solution without using for loops if possible

like image 284
Ashish Avatar asked Sep 17 '25 11:09

Ashish


2 Answers

%in% returns a logical vector. The opposite of a logical vector can be found with !, not -, so final[!final$workerId %in% omit, ] is what you want.

You could also use which to turn your logical into an integer index vector, and then you could use - like this: final[-which(final$workerId %in% omit), ], but the first way seems simpler.

Example:

mtcars[!mtcars$cyl %in% c(4, 6), ]
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
like image 78
Gregor Thomas Avatar answered Sep 20 '25 02:09

Gregor Thomas


here's a dplyr solution that may be of interest. The logic of the syntax is similar to the base R attempt you wrote in your question.

omit <- c("A3E9N7HDRLT8KV","A39HQTITNY9TVJ","A272A0JGRTBFCR","A1QPHQ1C27ZFI7")
final <- filter(final, !(workerId %in% omit))

dplyr's filter selects a subset of rows based on some condition. The condition we provide here is the set of character strings that are not (!) in (%in%) the vector omit. Because it's a dplyr function, you don't need to use the data frame name final when referencing the vector workerId after you call it in the first argument.

like image 32
J.Q Avatar answered Sep 20 '25 03:09

J.Q