I have a dataframe like this:
Setting q02_id c_school c_home c_work c_transport c_leisure Country
Rural 11900006 0 5 3 1 1 Vietnam
Rural 11900031 10 5 0 0 0 China
Rural 11900033 0 3 0 0 3 Vietnam
Rural 11900053 0 7 2 0 0 Vietnam
Rural 11900114 3 6 0 0 0 Malaysia
Rural 11900446 0 6 0 0 0 Vietnam
and I would like to divide columns 2, 3, 4, 5, 6 by the total for that particular country.
Doing it in base R is a bit clumsy:
df[df$Country=="Vietnam",][c(3, 4, 5, 6)] = df[df$Country=="Vietnam",][c(3, 4, 5, 6)] / sum(df[df$Country=="Vietnam",][c(3, 4, 5, 6)])
(I think that works).
I'm trying to convert as much of my code as possible to use tidyverse functions. Is there a way of doing the same thing more efficiently using, dplyr, for instance?
Thanks.
I trust this is what you are after:
Divide each column by the sum of that column - grouped by Country
library(tidyverse)
df1 %>%
group_by(Country) %>%
mutate_at(vars(c_school: c_leisure), funs(./ sum(.)))
#output
Setting q02_id c_school c_home c_work c_transport c_leisure Country
<fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
1 Rural 11900006 NaN 0.238 0.600 1.00 0.250 Vietnam
2 Rural 11900031 1.00 1.00 NaN NaN NaN China
3 Rural 11900033 NaN 0.143 0 0 0.750 Vietnam
4 Rural 11900053 NaN 0.333 0.400 0 0 Vietnam
5 Rural 11900114 1.00 1.00 NaN NaN NaN Malaysia
6 Rural 11900446 NaN 0.286 0 0 0 Vietnam
or alternatively divide each column by the total sum for each country as in your example (only difference is I used columns 3:7 as I trust you intended.
df1 %>%
mutate(sum = rowSums(.[,3:7])) %>%
group_by(Country) %>%
mutate_at(vars(c_school: c_leisure), funs(./ sum(sum))) %>%
select(-sum)
#output
Setting q02_id c_school c_home c_work c_transport c_leisure Country
<fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
1 Rural 11900006 0 0.161 0.0968 0.0323 0.0323 Vietnam
2 Rural 11900031 0.667 0.333 0 0 0 China
3 Rural 11900033 0 0.0968 0 0 0.0968 Vietnam
4 Rural 11900053 0 0.226 0.0645 0 0 Vietnam
5 Rural 11900114 0.333 0.667 0 0 0 Malaysia
6 Rural 11900446 0 0.194 0 0 0 Vietnam
data:
df1 = read.table(text ="Setting q02_id c_school c_home c_work c_transport c_leisure Country
Rural 11900006 0 5 3 1 1 Vietnam
Rural 11900031 10 5 0 0 0 China
Rural 11900033 0 3 0 0 3 Vietnam
Rural 11900053 0 7 2 0 0 Vietnam
Rural 11900114 3 6 0 0 0 Malaysia
Rural 11900446 0 6 0 0 0 Vietnam", header = T)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With