I have a data.table in R which has several ids and a value. For each combination of ids, there are several rows. If one of these rows contains NA in the column 'value', I would like to remove all rows with this combination of ids. For example, in the table below, I would like to remove all rows for which id1 == 2
and id2 == 1
.
If I had only one id I would do dat[!(id1 %in% dat[is.na(value),id1])]
. In the example, this would remove all rows where i1 == 2. However, I did not manage to include several columns.
dat <- data.table(id1 = c(1,1,2,2,2,2),
id2 = c(1,2,1,2,3,1),
value = c(5,3,NA,6,7,3))
For example, we can use the subset() function if we want to drop a row based on a condition. If we prefer to work with the Tidyverse package, we can use the filter() function to remove (or select) rows based on values in a column (conditionally, that is, and the same as using subset).
To remove rows from a data frame that exists in another data frame, we can use subsetting with single square brackets. This removal will help us to find the unique rows in the data frame based on the column of another data frame.
To remove a character in an R data frame column, we can use gsub function which will replace the character with blank. For example, if we have a data frame called df that contains a character column say x which has a character ID in each value then it can be removed by using the command gsub("ID","",as.
If you want to check per combination of id1
and id2
if any of the values are NA
s and then remove that whole combination, you can insert an if
statement per group and only retrieve the results (using .SD
) if that statement returns TRUE
.
dat[, if(!anyNA(value)) .SD, by = .(id1, id2)]
# id1 id2 value
# 1: 1 1 5
# 2: 1 2 3
# 3: 2 2 6
# 4: 2 3 7
Or similarly,
dat[, if(all(!is.na(value))) .SD, by = .(id1, id2)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With