I'm trying to tidy a dataset, using dplyr. My variables contain percentages and straightforward values (in this case, page views and bounce rates). I've tried to summarize them this way:
require(dplyr)
df<-df%>%
group_by(pagename)%>%
summarise(pageviews=sum(pageviews), bounceRate= weighted.mean(bounceRate,pageviews))
But this returns:
Error: 'x' and 'w' must have the same length
My dataset does not have any NA's in the both the page views and the bounce rates.
I'm not sure what I'm doing wrong, maybe summarise()
doesn't work with weighted.mean()
?
EDIT
I've added some data:
### Source: local data frame [4 x 3]
### pagename bounceRate pageviews
(chr) (dbl) (dbl)
###1 url1 72.22222 1176
###2 url2 46.42857 733
###3 url2 76.92308 457
###4 url3 62.06897 601
Data Visualization using R Programming Weighted mean is the average which is determined by finding the sum of the products of weights and the values then dividing this sum by the sum of total weights. If the weights are in proportion then the total sum of the weights should be 1.
The weighted mean is a type of mean that is calculated by multiplying the weight (or probability) associated with a particular event or outcome with its associated quantitative outcome and then summing all the products together.
1 The Weighted Median. The weighted median is an even better measure of central tendency than the plain median. It is also more “set-oriented” than the plain median. It factors in the number of times the two values in the middle subset of a table with an even number of rows appear.
The summarize()
command replaces variables in the order they appear in the command, so because you are changing the value of pageviews, that new value is being used in the weighted.mean. It's safer to use different names
df %>%
group_by(pagename)%>%
summarise(pageviews_sum = sum(pageviews),
bounceRate_mean = weighted.mean(bounceRate,pageviews))
And if you really want, you can rename afterward
df %>%
group_by(pagename) %>%
summarise(pageviews_sum = sum(pageviews),
bounceRate_mean = weighted.mean(bounceRate,pageviews)) %>%
rename(pageviews = pageviews_sum, bounceRate = bounceRate_mean)
I've found the solution.
Since summarise(pageviews=sum(pageviews)
is evaluated before bounceRate= weighted.mean(bounceRate,pageviews)
, the length of pageviews
is reduced and therefore shorter than bounceRate
, which triggers the error.
The solution is simple, just switch them:
require(dplyr)
df<-df%>%
group_by(pagename)%>%
summarise(bounceRate= weighted.mean(bounceRate,pageviews),pageviews=sum(pageviews))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With