Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove rows from data.table in R based on values of several columns

Tags:

r

data.table

I have a data.table in R which has several ids and a value. For each combination of ids, there are several rows. If one of these rows contains NA in the column 'value', I would like to remove all rows with this combination of ids. For example, in the table below, I would like to remove all rows for which id1 == 2 and id2 == 1.

If I had only one id I would do dat[!(id1 %in% dat[is.na(value),id1])]. In the example, this would remove all rows where i1 == 2. However, I did not manage to include several columns.

dat <- data.table(id1 = c(1,1,2,2,2,2),
                  id2 = c(1,2,1,2,3,1),
                  value = c(5,3,NA,6,7,3))
like image 775
lilaf Avatar asked Jan 17 '15 17:01

lilaf


People also ask

How do I remove specific rows in R based on condition?

For example, we can use the subset() function if we want to drop a row based on a condition. If we prefer to work with the Tidyverse package, we can use the filter() function to remove (or select) rows based on values in a column (conditionally, that is, and the same as using subset).

How do I remove rows from a different Dataframe in R?

To remove rows from a data frame that exists in another data frame, we can use subsetting with single square brackets. This removal will help us to find the unique rows in the data frame based on the column of another data frame.

How do I remove part of a data in R?

To remove a character in an R data frame column, we can use gsub function which will replace the character with blank. For example, if we have a data frame called df that contains a character column say x which has a character ID in each value then it can be removed by using the command gsub("ID","",as.


1 Answers

If you want to check per combination of id1 and id2 if any of the values are NAs and then remove that whole combination, you can insert an if statement per group and only retrieve the results (using .SD) if that statement returns TRUE.

dat[, if(!anyNA(value)) .SD, by = .(id1, id2)]
#    id1 id2 value
# 1:   1   1     5
# 2:   1   2     3
# 3:   2   2     6
# 4:   2   3     7

Or similarly,

dat[, if(all(!is.na(value))) .SD, by = .(id1, id2)]
like image 71
David Arenburg Avatar answered Nov 14 '22 22:11

David Arenburg