Summarizing count and conditional aggregate functions on the same factor

Q: What is the function of count () method in data aggregation?

The COUNT() function returns the number of rows in a database table.

Q: What is conditional aggregation?

Conditional aggregation, as its name implies, is performing data aggregation over a set of data that meets certain condition which is contained within a given data range.

Q: What does the Summarise function do in R?

Summarize Function in R Programming. As its name implies, the summarize function reduces a data frame to a summary of just one vector or value. Many times, these summaries are calculated by grouping observations using a factor or categorical variables first.

Q: How do you aggregate data in R?

In order to use the aggregate function for mean in R, you will need to specify the numerical variable on the first argument, the categorical (as a list) on the second and the function to be applied (in this case mean ) on the third. An alternative is to specify a formula of the form: numerical ~ categorical .

Tags:

r

dplyr

Quick and short of it is I'm having problems summarizing count and aggregate functions with conditions on the same factor.

Suppose I have this dataframe:

library(dplyr)

df = tbl_df(data.frame(
    company=c("Acme", "Meca", "Emca", "Acme", "Meca", "Emca"), 
    year=c("2011", "2010", "2009", "2011", "2010", "2013"), 
    product=c("Wrench", "Hammer", "Sonic Screwdriver", "Fairy Dust", 
              "Kindness", "Helping Hand"), 
    price=c("5.67", "7.12", "12.99", "10.99", NA, FALSE)))

which creates this dataframe (in essence):

   company year  product             price
1    Acme  2011  Wrench              5.67
2    Meca  2010  Hammer              7.12
3    Emca  2009  Sonic Screwdriver   12.99
4    Acme  2011  Fairy Dust          10.99
5    Meca  2010  Kindness            NA
...  ...   ...   ...                 ...
n    Emca  2013  Helping Hand        FALSE

Let's say I want to df <- group_by(df, company, year, product) and then get the following info all in one collection (i.e. dataframe):

Count of each price listing (including NA, False)
Count of each with 'NA' condition
Average price excluding NA and False

Max price

summarize(df, count = n()) #satisfies first item obviously

I'm having issues trying to get the others. I think I need to use pipe operators? If so, can anyone provide some guidance?

This is what I've tried and it is blatantly wrong, but I'm not sure where to go next:

 summarize(df,
           total.count = n(),
           count = filter(df, is.na(price)),
           avg.price = filter(df, !is.na(price), price != FALSE),
           max.price = max(filter(df, !is.na(price), price != FALSE))

And yes, I have reviewed documentation and I'm sure the answers are there, but they might be too advanced for my understanding. Thanks in advance!

231

asked Oct 27 '14 04:10

NewRRecruit

1 Answers

Assuming that your original dataset is similar to the one you created (i.e. with NA as character. You could specify na.strings while reading the data using read.table. But, I guess NAs would be detected automatically.

The price column is factor which needs to be converted to numeric class. When you use as.numeric, all the non-numeric elements (i.e. "NA", FALSE) gets coerced to NA) with a warning.

library(dplyr)
df %>%
     mutate(price=as.numeric(as.character(price))) %>%  
     group_by(company, year, product) %>%
     summarise(total.count=n(), 
               count=sum(is.na(price)), 
               avg.price=mean(price,na.rm=TRUE),
               max.price=max(price, na.rm=TRUE))

data

I am using the same dataset (except the ... row) that was showed.

df = tbl_df(data.frame(company=c("Acme", "Meca", "Emca", "Acme", "Meca","Emca"),
 year=c("2011", "2010", "2009", "2011", "2010", "2013"), product=c("Wrench", "Hammer",
 "Sonic Screwdriver", "Fairy Dust", "Kindness", "Helping Hand"), price=c("5.67",
 "7.12", "12.99", "10.99", "NA",FALSE)))

answered Oct 16 '22 09:10

akrun

Related questions
                            
                                How to nicely annotate a ggplot2 (manual)
                            
                                Loop in R to read many files
                            
                                How to check existence of an input argument for R functions
                            
                                model.matrix() with na.action=NULL?
                            
                                How to convert character of percentage into numeric in R
                            
                                How to convert data.frame column from Factor to numeric [duplicate]
                            
                                R - Filter a vector using a function
                            
                                Find complement of a data frame (anti - join)
                            
                                Colouring plot by factor in R
                            
                                Logistic Regression PMML won't Produce Probabilities
                            
                                What are the caveats of using source versus parse & eval?
                            
                                FAQ markup to R data structure
                            
                                Why is the diag function so slow? [in R 3.2.0 or earlier]
                            
                                Confused by ...()?
                            
                                R Error: java.lang.OutOfMemoryError: Java heap space
                            
                                Options for deploying R models in production
                            
                                What does the @ symbol mean in R?
                            
                                Normalizing y-axis in histograms in R ggplot to proportion by group
                            
                                List comprehension in R
                            
                                grid.arrange from gridExtras exiting with "only 'grobs' allowed in 'gList'" after update

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With