Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

summarize_all with "n()" function

Tags:

r

dplyr

summary

I'm summarizing a data frame in dplyr with the summarize_all() function. If I do the following:

summarize_all(mydf, list(mean="mean", median="median", sd="sd"))

I get a tibble with 3 variables for each of my original measures, all suffixed by the type (mean, median, sd). Great! But when I try to capture the within-vector n's to calculate the standard deviations myself and to make sure missing cells aren't counted...

summarize_all(mydf, list(mean="mean", median="median", sd="sd", n="n"))

...I get an error:

Error in (function ()  : unused argument (var_a)

This is not an issue with my var_a vector. If I remove it, I get the same error for var_b, etc. The summarize_all function is producing odd results whenever I request n or n(), or if I use .funs() and list the descriptives I want to compute instead.

What's going on?

like image 596
J.Q Avatar asked Sep 03 '25 05:09

J.Q


1 Answers

The reason it's giving you problems is because n() doesn't take any arguments, unlike mean() and median(). Use length() instead to get the desired effect:

summarize_all(mydf, list(mean="mean", median="median", sd="sd", n="length"))
like image 182
Artem Sokolov Avatar answered Sep 04 '25 23:09

Artem Sokolov