I have a data.table in R which has several ids and a value. For each combination of ids, there are several rows. If one of these rows contains NA in the column 'value', I would like to remove all rows with this combination of ids. For example, in the table below, I would like to remove all rows for which <code>id1 == 2</code> and <code>id2 == 1</code>. If I had only one id I would do <code>dat[!(id1 %in% dat[is.na(value),id1])]</code>. In the example, this would remove all rows where i1 == 2. However, I did not manage to include several columns. <pre class="prettyprint"><code>dat <- data.table(id1 = c(1,1,2,2,2,2), id2 = c(1,2,1,2,3,1), value = c(5,3,NA,6,7,3)) </code></pre>

If you want to check per combination of <code>id1</code> and <code>id2</code> if any of the values are <code>NA</code>s and then remove that whole combination, you can insert an <code>if</code> statement per group and only retrieve the results (using <code>.SD</code>) if that statement returns <code>TRUE</code>. <pre class="prettyprint"><code>dat[, if(!anyNA(value)) .SD, by = .(id1, id2)] # id1 id2 value # 1: 1 1 5 # 2: 1 2 3 # 3: 2 2 6 # 4: 2 3 7 </code></pre> Or similarly, <pre class="prettyprint"><code>dat[, if(all(!is.na(value))) .SD, by = .(id1, id2)] </code></pre>

Remove rows from data.table in R based on values of several columns

Tags:

r

data.table

I have a data.table in R which has several ids and a value. For each combination of ids, there are several rows. If one of these rows contains NA in the column 'value', I would like to remove all rows with this combination of ids. For example, in the table below, I would like to remove all rows for which id1 == 2 and id2 == 1.

If I had only one id I would do dat[!(id1 %in% dat[is.na(value),id1])]. In the example, this would remove all rows where i1 == 2. However, I did not manage to include several columns.

dat <- data.table(id1 = c(1,1,2,2,2,2),
                  id2 = c(1,2,1,2,3,1),
                  value = c(5,3,NA,6,7,3))

775

asked Jan 17 '15 17:01

lilaf

1 Answers

If you want to check per combination of id1 and id2 if any of the values are NAs and then remove that whole combination, you can insert an if statement per group and only retrieve the results (using .SD) if that statement returns TRUE.

dat[, if(!anyNA(value)) .SD, by = .(id1, id2)]
#    id1 id2 value
# 1:   1   1     5
# 2:   1   2     3
# 3:   2   2     6
# 4:   2   3     7

Or similarly,

dat[, if(all(!is.na(value))) .SD, by = .(id1, id2)]

answered Nov 14 '22 22:11

David Arenburg

Related questions
                            
                                Once again: Setting the environment within a function
                            
                                How to plot and colour streets in a SpatialLinesDataFrame with ggplot/ggmap?
                            
                                log scale and limits with ggvis
                            
                                subsetting data frame by row index
                            
                                How to stop R from leaving zombie processes behind
                            
                                R - put labels inside pie chart
                            
                                Why doesn't lapply work on S4 objects which have an as.list.default method?
                            
                                Year fractions using Actual/365 convention in R
                            
                                Scientific notation in knitr: How to improve typography
                            
                                Writing a data frame to a Teradata table using RJDBC
                            
                                How do you tell programmatically if you are running Architect/StatET?
                            
                                Change from date time to numeric AND back to date time in R
                            
                                Calculation of return levels based on a GPD in different R packages
                            
                                R object not found if defined within a function when using data.table dplyr
                            
                                Error: Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'builtin'?
                            
                                Coercing a column of lists into a string in an R data frame
                            
                                Check for missing argument in parent function
                            
                                What does “Knit HTML” do in Rstudio 0.98?
                            
                                Render R Presentation from the command line
                            
                                How to set a conditional panel to a selectinput in shiny?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With