I'm trying to use the ddply method to take a dataframe with various info about 3000 movies and then calculate the mean gross of each genre. I'm new to R, and I've read all the questions on here relating to ddply, but I still can't seem to get it right. Here's what I have now:
> attach(movies)
> ddply(movies, Genre, mean(Gross))
Error in llply(.data = .data, .fun = .fun, ..., .progress = .progress, :
.fun is not a function.
How am I supposed to write a function that takes the mean of the values in the "Gross" column for each set of movies, grouped by genre? I know this seems like a simple question, but the documentation is really confusing to me, and I'm not too familiar with R syntax yet.
Is there a method other than ddply that would make this easier?
Thanks!!
Moreover, we will subset a data frame in R using the subset() function. Also, we will subset using the select() and filter() functions from the dplyr package (Wickham et al., 2020). Last but not least, we will select random sample from data by using sample() function.
ddply: Split data frame, apply function, and return results in a data frame.
The most general way to subset a data frame by rows and/or columns is the base R Extract[] function, indicated by matched square brackets instead of the usual matched parentheses.
Subsetting rows using the subset function The subset function with a logical statement will let you subset the data frame by observations.
Here is an example using the tips dataset available in ggplot2
library(ggplot2);
mean_tip_by_day = ddply(tips, .(day), summarize, mean_tip = mean(tip/total_bill))
Hope this is useful
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With