I am using R and have searched around for an answer but while I have seen similar questions, it has not worked for my specific problem.
In my data set I am trying to use the NA
's as placeholders because I am going to return to them once I get part of my analysis done so therefore, I would like to be able to do all my calculations as if the NA
's weren't really there.
Here's my issue with an example data table
ROCA = c(1,3,6,2,1,NA,2,NA,1,NA,4,NA)
ROCA <- data.frame (ROCA=ROCA) # converting it just because that is the format of my original data
#Now my function
exceedes <- function (L=NULL, R=NULL, na.rm = T)
{
if (is.null(L) | is.null(R)) {
print ("mycols: invalid L,R.")
return (NULL)
}
test <-(mean(L, na.rm=TRUE)-R*sd(L,na.rm=TRUE))
test1 <- sapply(L,function(x) if((x)> test){1} else {0})
return (test1)
}
L=ROCA[,1]
R=.5
ROCA$newcolumn <- exceedes(L,R)
names(ROCA)[names(ROCA)=="newcolumn"]="Exceedes1"
I am getting the error:
Error in if ((x) > test) { : missing value where TRUE/FALSE needed
As you guys know, it is something wrong with the sapply function. Any ideas on how to ignore those NA
's? I would try na.omit
if I could get it to insert all the NA
's right where they were before, but I am not sure how to do that.
First, if we want to exclude missing values from mathematical operations use the na. rm = TRUE argument. If you do not exclude these values most functions will return an NA . We may also desire to subset our data to obtain complete observations, those observations (rows) in our data that contain no missing data.
To remove all rows having NA, we can use na. omit function. For Example, if we have a data frame called df that contains some NA values then we can remove all rows that contains at least one NA by using the command na. omit(df).
In R, missing values are represented by the symbol NA (not available). Impossible values (e.g., dividing by zero) are represented by the symbol NaN (not a number). Unlike SAS, R uses the same symbol for character and numeric data.
In R, the easiest way to find columns that contain missing values is by combining the power of the functions is.na() and colSums(). First, you check and count the number of NA's per column. Then, you use a function such as names() or colnames() to return the names of the columns with at least one missing value.
There's no need for sapply
and your anonymous function because >
is already vectorized.
It also seems really odd to specify default argument values that are invalid. My guess is that you're using that as a kludge instead of using the missing
function. It's also good practice to throw an error rather than return NULL
because you would still have to try to catch when the function returns NULL
.
exceedes <- function (L, R, na.rm=TRUE)
{
if(missing(L) || missing(R)) {
stop("L and R must be provided")
}
test <- mean(L,na.rm=TRUE)-R*sd(L,na.rm=TRUE)
as.numeric(L > test)
}
ROCA <- data.frame(ROCA=c(1,3,6,2,1,NA,2,NA,1,NA,4,NA))
ROCA$Exceeds1 <- exceedes(ROCA[,1],0.5)
This statement is strange:
test1 <- sapply(L,function(x) if((x)> test){1} else {0})
Try:
test1 <- ifelse(is.na(L), NA, ifelse(L > test, 1, 0))
Do you want NA:s in the result? That is, do you want the rows to line up?
seems like just returning L > test
would work then. And adding the column can be simplified too (I suspect "Exeedes1" is in a variable somewhere).
exceedes <- function (L=NULL, R=NULL, na.rm = T)
{
if (is.null(L) | is.null(R)) {
print ("mycols: invalid L,R.")
return (NULL)
}
test <-(mean(L, na.rm=TRUE)-R*sd(L,na.rm=TRUE))
L > test
}
L=ROCA[,1]
R=.5
ROCA[["Exceedes1"]] <- exceedes(L,R)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With