Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to pass na.rm as argument to tapply?

Tags:

r

tapply

na.rm

I´d like to calculate mean and sd from a dataframe with one column for the parameter and one column for a group identifier. How can I calculate them when using tapply? I could use sd(v1, group, na.rm=TRUE), but can´t fit the na.rm=TRUE into the statement when using tapply. omit.na is no option. I have a whole bunch of parameters and have to go through them step by step without losing half of the dataframe when excluding all lines with one missing value.

data("weightgain", package = "HSAUR")
tapply(weightgain$weightgain, list(weightgain$source, weightgain$type), mean)

The same holds true for the by statement.

x<-c(1,2,3,4,5,6,7,8,9,NA)
y<-c(2,3,NA,3,4,NA,2,3,NA,2)
group<-rep((factor(LETTERS[1:2])),5)
df<-data.frame(x,y,group)
df

by(df$x,df$group,summary)
by(df$x,df$group,mean)

sd(df$x) #result: NA
sd(df$x, na.rm=TRUE) #result: 2.738613

Any ideas how to get this done?

like image 435
Doc Avatar asked Jan 05 '13 14:01

Doc


People also ask

How do I remove Na from tapply?

Suppose that your data frame contains some NA values in its columns. Within the tapply function you can specify additional arguments of the function you are applying, after the FUN argument. In this case, the mean function allows you to specify the na. rm argument to remove NA values.

What does tapply () do in R?

tapply in R. Apply a function to each cell of a ragged array, that is to each (non-empty) group of values given by a unique combination of the levels of certain factors. Basically, tapply() applies a function or operation on subset of the vector broken down by a given factor variable.

What is the output of tapply?

By default, if the applied function returns a scalar, then tapply returns a vector. In this case we are applying the mean function, so the output of tapply is a numeric vector.

Why it is necessary to add the option na RM true?

Argument na. rm gives a simple way of removing missing values from data if they are coded as NA . In base R its standard default value is FALSE , meaning, NA 's are not removed.


1 Answers

Simply set na.rm=TRUE in the tapply function:

tapply(weightgain$weightgain, list(weightgain$source, weightgain$type), mean, na.rm=TRUE)
like image 86
user4196706 Avatar answered Oct 14 '22 23:10

user4196706