I'm trying to tidy a dataset, using dplyr. My variables contain percentages and straightforward values (in this case, page views and bounce rates). I've tried to summarize them this way: <pre class="prettyprint"><code>require(dplyr) df<-df%>% group_by(pagename)%>% summarise(pageviews=sum(pageviews), bounceRate= weighted.mean(bounceRate,pageviews)) </code></pre> But this returns: <pre class="prettyprint"><code> Error: 'x' and 'w' must have the same length </code></pre> My dataset does not have any NA's in the both the page views and the bounce rates. I'm not sure what I'm doing wrong, maybe <code>summarise()</code> doesn't work with <code>weighted.mean()</code>? EDIT I've added some data: <pre class="prettyprint"><code>### Source: local data frame [4 x 3] ### pagename bounceRate pageviews (chr) (dbl) (dbl) ###1 url1 72.22222 1176 ###2 url2 46.42857 733 ###3 url2 76.92308 457 ###4 url3 62.06897 601 </code></pre>

I've found the solution. Since <code>summarise(pageviews=sum(pageviews)</code> is evaluated before <code>bounceRate= weighted.mean(bounceRate,pageviews)</code>, the length of <code>pageviews</code>is reduced and therefore shorter than <code>bounceRate</code>, which triggers the error. The solution is simple, just switch them: <pre class="prettyprint"><code>require(dplyr) df<-df%>% group_by(pagename)%>% summarise(bounceRate= weighted.mean(bounceRate,pageviews),pageviews=sum(pageviews)) </code></pre>

Using summarise with weighted mean from dplyr in R

Tags:

r

dplyr

I'm trying to tidy a dataset, using dplyr. My variables contain percentages and straightforward values (in this case, page views and bounce rates). I've tried to summarize them this way:

require(dplyr)
df<-df%>%
   group_by(pagename)%>%
   summarise(pageviews=sum(pageviews), bounceRate= weighted.mean(bounceRate,pageviews))

But this returns:

 Error: 'x' and 'w' must have the same length

My dataset does not have any NA's in the both the page views and the bounce rates. I'm not sure what I'm doing wrong, maybe summarise() doesn't work with weighted.mean()?

EDIT

I've added some data:

### Source: local data frame [4 x 3]

###               pagename bounceRate pageviews
                    (chr)      (dbl)     (dbl)
###1                url1   72.22222      1176
###2                url2   46.42857       733
###3                url2   76.92308       457
###4                url3   62.06897       601

659

asked Mar 23 '17 14:03

Tobias van Elferen

2 Answers

The summarize() command replaces variables in the order they appear in the command, so because you are changing the value of pageviews, that new value is being used in the weighted.mean. It's safer to use different names

df %>%
   group_by(pagename)%>%
   summarise(pageviews_sum = sum(pageviews), 
      bounceRate_mean = weighted.mean(bounceRate,pageviews))

And if you really want, you can rename afterward

df %>%
   group_by(pagename) %>%
   summarise(pageviews_sum = sum(pageviews), 
      bounceRate_mean = weighted.mean(bounceRate,pageviews)) %>% 
   rename(pageviews = pageviews_sum, bounceRate = bounceRate_mean)

answered Oct 20 '22 06:10

MrFlick

I've found the solution. Since summarise(pageviews=sum(pageviews) is evaluated before bounceRate= weighted.mean(bounceRate,pageviews), the length of pageviewsis reduced and therefore shorter than bounceRate, which triggers the error.

The solution is simple, just switch them:

require(dplyr)
df<-df%>%
  group_by(pagename)%>%
  summarise(bounceRate= weighted.mean(bounceRate,pageviews),pageviews=sum(pageviews))

answered Oct 20 '22 04:10

Tobias van Elferen

Related questions
                            
                                python asyncio add_done_callback with async def
                            
                                Layer called with an input that isn't a symbolic tensor keras
                            
                                Why is Kotlin String.split with a regex string not the same as Java?
                            
                                Entry module not found: Error: Can't resolve './src/index.js'
                            
                                How to know all style options of a ttk widget?
                            
                                How to rename an AWS customer IAM policy?
                            
                                Docker: Is the server running on host "localhost" (::1) and accepting TCP/IP connections on port 5432?
                            
                                How to omit the constructor parameter with a default value when calling Kotlin in Java?
                            
                                How do I add an intermediate SSL certificate to Kubernetes ingress TLS configuration?
                            
                                Which C# version .NET Core uses?
                            
                                What is springboot versioning convention?
                            
                                Extend slice length on the left

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using summarise with weighted mean from dplyr in R

Tags:

r

dplyr

Tobias van Elferen

People also ask

2 Answers

MrFlick

Tobias van Elferen

Recent Activity

Donate For Us