I have a data frame matrix in R that I wish to order by the sum of columns in a decreasing order. My data varies from values of +1 to -1. I have this code that does this pretty perfectly:
DF<-DF[, order(colSums(-DF))]
However, I do have some NA values spread out amongst the data (no single column or row is all NA so I cannot simply remove an entire column or row). I believe that the data is not being sorted properly, as columns that contain NAs are not sorted, and just placed behind the sorted columns.
Is there a way to order the data by sum of columns as above, but also allowing the sorting of columns with NAs as well?
If I understand you correctly, you want to sort "NA columns" behind "non-NA columns", but then you also want to sort the NA columns amongst themselves based on the result of colSums() applied to the non-NA cells within the NA columns. You can do this with an additional argument to order() to break ties in which you call colSums() with the additional argument na.rm=TRUE. Here's a demo with 4 columns total, 2 with NAs, 2 without:
set.seed(3L)
df <- setNames(rev(as.data.frame(replicate(4L,
sample(c(seq(-1,1,0.5),NA),
5L,rep=TRUE)))),letters[1:4])
df ## columns a and b are "NA columns", columns c and d are "non-NA columns"
## a b c d
## 1 1.0 0.5 0.5 -0.5
## 2 -1.0 0.5 -1.0 1.0
## 3 1.0 0.5 -0.5 0.0
## 4 NA 0.5 0.5 -0.5
## 5 -0.5 NA 0.5 0.5
colSums(-df) ## d should be moved before c, but can't tell yet about a and b
## a b c d
## NA NA 0.0 -0.5
colSums(-df,na.rm=TRUE) ## this can tiebreak a and b; b should be moved before a
## a b c d
## -0.5 -2.0 0.0 -0.5
df[,order(colSums(-df))] ## fails to order NA columns
## d c a b
## 1 -0.5 0.5 1.0 0.5
## 2 1.0 -1.0 -1.0 0.5
## 3 0.0 -0.5 1.0 0.5
## 4 -0.5 0.5 NA 0.5
## 5 0.5 0.5 -0.5 NA
df[,order(colSums(-df),colSums(-df,na.rm=TRUE))] ## tiebreaker orders NA columns properly
## d c b a
## 1 -0.5 0.5 0.5 1.0
## 2 1.0 -1.0 0.5 -1.0
## 3 0.0 -0.5 0.5 1.0
## 4 -0.5 0.5 0.5 NA
## 5 0.5 0.5 NA -0.5
Sorry, I misunderstood. Looks like this is what you're looking for:
df[,order(colSums(-df,na.rm=TRUE))]
## b a d c
## 1 0.5 1.0 -0.5 0.5
## 2 0.5 -1.0 1.0 -1.0
## 3 0.5 1.0 0.0 -0.5
## 4 0.5 NA -0.5 0.5
## 5 NA -0.5 0.5 0.5
Note that passing na.rm=TRUE is equivalent to treating NAs as zero, contrary to your proviso that regarding NAs as zero would mess up the sorting.
To allow for NA columns to be sorted equally with non-NA columns, use the "na.rm=TRUE" argument in the "colSums" function. This will override the original ordering of colSums where the NA columns are left unsorted behind the sorted columns. The final code is:
DF<-DF[, order(colSums(-DF, na.rm=T))]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With