New to R, so just my getting head around the data wrangling aspect. Tried looking for a similar question but couldn't find it.
I would like to add an additional column that is the percentage of views split for each day between the article groups. Example Dataset below
views date article
1578 2015-01-01 A
616 2015-01-01 B
575 2015-01-01 C
1744 2015-01-02 A
541 2015-01-02 B
660 2015-01-02 C
2906 2015-01-03 A
629 2015-01-03 B
643 2015-01-03 C
And the expected result I am looking for..
views percentage date article
1578 56.99 2015-01-01 A
616 22.25 2015-01-01 B
575 20.77 2015-01-01 C
1744 59.22 2015-01-02 A
541 18.37 2015-01-02 B
660 22.41 2015-01-02 C
2906 69.55 2015-01-03 A
629 15.06 2015-01-03 B
643 15.39 2015-01-03 C
I know this is possible by splitting the date frame using subsets but I would hope there is more neat approach using a library ?
Thanks !
library(dplyr)
df %>% group_by(date) %>% mutate( percentage = views/sum(views))
Source: local data frame [9 x 4]
Groups: date
views date article percentage
1 1578 2015-01-01 A 0.5698808
2 616 2015-01-01 B 0.2224630
3 575 2015-01-01 C 0.2076562
4 1744 2015-01-02 A 0.5921902
5 541 2015-01-02 B 0.1837012
6 660 2015-01-02 C 0.2241087
7 2906 2015-01-03 A 0.6955481
8 629 2015-01-03 B 0.1505505
9 643 2015-01-03 C 0.1539014
Or, if multiple identical articles are possible per day:
df %>% group_by(date) %>% mutate(sum = sum(views)) %>%
group_by(date, article) %>% mutate(percentage = views/sum) %>%
select(-sum)
If df
is your data.frame, you can do:
library(data.table)
setDT(df)[,percentage:=signif(100*views/sum(views),4),by=date][]
# views date article percentage
#1: 1578 2015-01-01 A 56.99
#2: 616 2015-01-01 B 22.25
#3: 575 2015-01-01 C 20.77
#4: 1744 2015-01-02 A 59.22
#5: 541 2015-01-02 B 18.37
#6: 660 2015-01-02 C 22.41
#7: 2906 2015-01-03 A 69.55
#8: 629 2015-01-03 B 15.06
#9: 643 2015-01-03 C 15.39
Or base R
:
df$percentage = signif(100*with(df, ave(views, date, FUN=function(x) x/sum(x))),4)
Data:
df = structure(list(views = c(1578L, 616L, 575L, 1744L, 541L, 660L,
2906L, 629L, 643L), date = structure(c(1L, 1L, 1L, 2L, 2L, 2L,
3L, 3L, 3L), .Label = c("2015-01-01", "2015-01-02", "2015-01-03"
), class = "factor"), article = structure(c(1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L), .Label = c("A", "B", "C"), class = "factor"),
percentage = c(56.99, 22.25, 20.77, 59.22, 18.37, 22.41,
69.55, 15.06, 15.39)), .Names = c("views", "date", "article",
"percentage"), class = "data.frame", row.names = c(NA, -9L))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With