I am using the mtcars
dataset. I want to find the number of records for a particular combination of data. Something very similar to the count(*)
group by clause in SQL. ddply()
from plyr is working for me
library(plyr) ddply(mtcars, .(cyl,gear),nrow)
has output
cyl gear V1 1 4 3 1 2 4 4 8 3 4 5 2 4 6 3 2 5 6 4 4 6 6 5 1 7 8 3 12 8 8 5 2
Using this code
library(dplyr) g <- group_by(mtcars, cyl, gear) summarise(g, length(gear))
has output
length(cyl) 1 32
I found various functions to pass in to summarise()
but none seem to work for me. One function I found is sum(G)
, which returned
Error in eval(expr, envir, enclos) : object 'G' not found
Tried using n()
, which returned
Error in n() : This function should not be called directly
What am I doing wrong? How can I get group_by()
/ summarise()
to work for me?
The count() method can be applied to the input dataframe containing one or more columns and returns a frequency count corresponding to each of the groups. The columns returned on the application of this method is a proper subset of the columns of the original dataframe.
count() lets you quickly count the unique values of one or more variables: df %>% count(a, b) is roughly equivalent to df %>% group_by(a, b) %>% summarise(n = n()) . count() is paired with tally() , a lower-level helper that is equivalent to df %>% summarise(n = n()) .
To get number of rows in R Data Frame, call the nrow() function and pass the data frame as argument to this function. nrow() is a function in R base package.
The ncol() function in R programming That is, ncol() function returns the total number of columns present in the object.
There's a special function n()
in dplyr to count rows (potentially within groups):
library(dplyr) mtcars %>% group_by(cyl, gear) %>% summarise(n = n()) #Source: local data frame [8 x 3] #Groups: cyl [?] # # cyl gear n # (dbl) (dbl) (int) #1 4 3 1 #2 4 4 8 #3 4 5 2 #4 6 3 2 #5 6 4 4 #6 6 5 1 #7 8 3 12 #8 8 5 2
But dplyr also offers a handy count
function which does exactly the same with less typing:
count(mtcars, cyl, gear) # or mtcars %>% count(cyl, gear) #Source: local data frame [8 x 3] #Groups: cyl [?] # # cyl gear n # (dbl) (dbl) (int) #1 4 3 1 #2 4 4 8 #3 4 5 2 #4 6 3 2 #5 6 4 4 #6 6 5 1 #7 8 3 12 #8 8 5 2
another approach is to use the double colons:
mtcars %>% dplyr::group_by(cyl, gear) %>% dplyr::summarise(length(gear))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With