I have what I think is a very simple question related to the use of data.table and the <code>:=</code> function. I don't think I quite understand the behaviour of <code>:=</code> and often I run into similar problems. Here is some example data <pre class="prettyprint"><code> mat <- structure(list( col1 = c(NA, 0, -0.015038, 0.003817, -0.011407), col2 = c(0.003745, 0.007463, -0.007407, -0.003731, -0.007491)), .Names = c("col1", "col2"), row.names = c(NA, 10L), class = c("data.table", "data.frame")) </code></pre> which gives <pre class="prettyprint"><code>> mat col1 col2 1: NA 0.003745 2: 0.000000 0.007463 3: -0.015038 -0.007407 4: 0.003817 -0.003731 5: -0.011407 -0.007491 </code></pre> I want to create a column called col3 which gives the sum of col1 and col2. If I use <pre class="prettyprint"><code>mat[,col3 := col1 + col2] # col1 col2 col3 #1: NA 0.003745 NA #2: 0.000000 0.007463 0.007463 #3: -0.015038 -0.007407 -0.022445 #4: 0.003817 -0.003731 0.000086 #5: -0.011407 -0.007491 -0.018898 </code></pre> then I get an NA for the first row, but I want NAs to be ignored. So I tried instead <pre class="prettyprint"><code>mat[,col3 := sum(col1,col2,na.rm=TRUE)] # col1 col2 col3 #1: NA 0.003745 -0.030049 #2: 0.000000 0.007463 -0.030049 #3: -0.015038 -0.007407 -0.030049 #4: 0.003817 -0.003731 -0.030049 #5: -0.011407 -0.007491 -0.030049 </code></pre> which is not what I am after, since it is giving me the sum of all elements of col1 and col2. I think I don't quite get <code>:=</code>... How can I get the sum of the element of col1 and col2 ignoring NA values? Not sure this is relevant, but here is my sessionInfo <pre class="prettyprint"><code>> sessionInfo() R version 2.15.1 (2012-06-22) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] data.table_1.8.3 </code></pre>

This is standard <code>R</code> behaviour, nothing really to do with <code>data.table</code> Adding anything to <code>NA</code> will return <code>NA</code> <pre class="prettyprint"><code>NA + 1 ## NA </code></pre> <code>sum</code> will return a single number If you want <code>1 + NA</code> to return <code>1</code> then you will have to run something like <pre class="prettyprint"><code>mat[,col3 := col1 + col2] mat[is.na(col1), col3 := col2] mat[is.na(col2), col3 := col1] </code></pre> To deal with when <code>col1</code> or <code>col2</code> are <code>NA</code> <hr> <h3>EDIT - an easier solution</h3> You could also use rowSums, which has a <code>na.rm</code> argument <pre class="prettyprint"><code>mat[ , col3 :=rowSums(.SD, na.rm = TRUE), .SDcols = c("col1", "col2")] </code></pre> <code>rowSums</code> is what you want (by definition, the <code>rowSums</code> of a matrix containing <code>col1</code> and <code>col2</code>, removing <code>NA</code> values (@JoshuaUlrich suggested this as a comment )

Using `:=` in data.table to sum the values of two columns in R, ignoring NAs

Tags:

I have what I think is a very simple question related to the use of data.table and the := function. I don't think I quite understand the behaviour of := and often I run into similar problems.

Here is some example data

 mat <- structure(list(               col1 = c(NA, 0, -0.015038, 0.003817, -0.011407),                col2 = c(0.003745, 0.007463, -0.007407, -0.003731, -0.007491)),                .Names = c("col1", "col2"),                row.names = c(NA, 10L),                class = c("data.table", "data.frame"))

which gives

> mat          col1      col2  1:        NA  0.003745  2:  0.000000  0.007463  3: -0.015038 -0.007407  4:  0.003817 -0.003731  5: -0.011407 -0.007491

I want to create a column called col3 which gives the sum of col1 and col2. If I use

mat[,col3 := col1 + col2]  #        col1      col2      col3 #1:        NA  0.003745        NA #2:  0.000000  0.007463  0.007463 #3: -0.015038 -0.007407 -0.022445 #4:  0.003817 -0.003731  0.000086 #5: -0.011407 -0.007491 -0.018898

then I get an NA for the first row, but I want NAs to be ignored. So I tried instead

mat[,col3 := sum(col1,col2,na.rm=TRUE)]  #        col1      col2      col3 #1:        NA  0.003745 -0.030049 #2:  0.000000  0.007463 -0.030049 #3: -0.015038 -0.007407 -0.030049 #4:  0.003817 -0.003731 -0.030049 #5: -0.011407 -0.007491 -0.030049

which is not what I am after, since it is giving me the sum of all elements of col1 and col2. I think I don't quite get :=... How can I get the sum of the element of col1 and col2 ignoring NA values?

Not sure this is relevant, but here is my sessionInfo

> sessionInfo() R version 2.15.1 (2012-06-22) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)  locale: [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8  attached base packages: [1] stats     graphics  grDevices utils     datasets  methods   base       other attached packages: [1] data.table_1.8.3

570

asked Oct 28 '12 05:10

Vivi

1 Answers

This is standard R behaviour, nothing really to do with data.table

Adding anything to NA will return NA

NA + 1 ## NA

sum will return a single number

If you want 1 + NA to return 1

then you will have to run something like

mat[,col3 := col1 + col2] mat[is.na(col1), col3 := col2] mat[is.na(col2), col3 := col1]

To deal with when col1 or col2 are NA

EDIT - an easier solution

You could also use rowSums, which has a na.rm argument

mat[ , col3 :=rowSums(.SD, na.rm = TRUE), .SDcols = c("col1", "col2")]

rowSums is what you want (by definition, the rowSums of a matrix containing col1 and col2, removing NA values

(@JoshuaUlrich suggested this as a comment )

answered Sep 20 '22 17:09

mnel

Related questions
                            
                                Can PackageManager.getInstallerPackageName() tell me that my app was installed from Amazon app store?
                            
                                How do you make menu item (JMenuItem) shortcut?
                            
                                Configuring Warden for use in RSpec controller specs
                            
                                NodeJS plugin in IntelliJ Community Edition does not work
                            
                                System.Data.SqlClient.SqlConnection does not contain a definition for Query with dapper and c#
                            
                                What is the alternative method for the "self.isTouchEnabled " in Cocos2d 2.0?
                            
                                jQuery FileUpload doesn't trigger 'done'
                            
                                How to force a y axis to minimum and maximum range in R?
                            
                                Remove "Done" button of ActionMode
                            
                                Golang pointers
                            
                                How to sleep for 1 second between each xargs command?
                            
                                Getting an error "Could not load definitions from resource net/sf/antcontrib/antcontrib.properties. It could not be found."

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With