Conditional mutate cumsum dlpyr

Question

I have towns (from A to D), which have different populations, and are at different distances. The objective is to add up the total population living within the circle of radius (distance XY) where X is a town in the centre of the circle and Y any other town.

In this code:

    Df <- structure(list(Town_From = c("A", "A", "A", "B", "B", "C"), Town_To = c("B", 
    "C", "D", "C", "D", "D"), Distance = c(10, 5, 18, 17, 20, 21)), .Names = c("Town_From", 
    "Town_To", "Distance"), row.names = c(NA, -6L), class = "data.frame")

    Df2 <- structure(list(Town = c("A", "B", "C", "D"), Population = c(1000, 
    800, 500, 200)), .Names = c("Town", "Population"), row.names = c(NA, 
    -4L), class = "data.frame")

    Df <- Df %>% left_join(Df2,by=c("Town_From"="Town")) %>% 
      left_join(Df2,by=c("Town_To"="Town"))%>%
      group_by(Town_From) %>% 
      arrange(Distance)
    colnames(Df)[4]<-c("pop_TF")
    colnames(Df)[5]<-c("pop_TT")
Source: local data frame [6 x 5]
Groups: Town_From [3]

  Town_From Town_To Distance pop_TF pop_TT
      <chr>   <chr>    <dbl>  <dbl>  <dbl>
1         A       C        5   1000    500
2         A       B       10   1000    800
3         B       C       17    800    500
4         A       D       18   1000    200
5         B       D       20    800    200
6         C       D       21    500    200

towns have been organised by (Town_From) and arranged by (distance).

Within the circle of 5km radius (from A to C) live 1000 (in A) + 500 (in C)= 1500 people; within the next circle live 1500 + 800 (in B) =2300. Within the third circle still live 2300 people because towns A, B, C are within the circle radius B to C = 17 km. Within the Circle radius A to D = 18km, live 2300 + 200 (in D)=2500people.

Here is a visualization of the circles in question. In theory, the circles could expand to any arbitrary radius. In practice, I only need to check them at the distances between pairs of towns (places where the counts change).

enter image description here

Mark Peterson · Accepted Answer

For this, it is easier if you can put your data into a format where each town is represented on each "end" of the distance (both the to and the from). So, I changed the change you made at the end to Df to this instead. Note that it uses complete from tidyr.

Df_full <-
  Df %>%
  bind_rows(
    select(Df, Town_From = Town_To, Town_To = Town_From, Distance)
  ) %>%
  complete(Town_From, Town_To, fill = list(Distance = 0)) %>%
  left_join(Df2, c("Town_To" = "Town"))

This reverses the to-from relationship and appends it to the bottom of the list. Then, it uses complete to add the town as its own "To" (e.g., From A to A). Finally, it joins the populations in, but they now only need to be added once. Here is the new data:

# A tibble: 16 × 4
   Town_From Town_To Distance Population
       <chr>   <chr>    <dbl>      <dbl>
1          A       A        0       1000
2          A       B       10        800
3          A       C        5        500
4          A       D       18        200
5          B       A       10       1000
6          B       B        0        800
7          B       C       17        500
8          B       D       20        200
9          C       A        5       1000
10         C       B       17        800
11         C       C        0        500
12         C       D       21        200
13         D       A       18       1000
14         D       B       20        800
15         D       C       21        500
16         D       D        0        200

Next, we set the thresholds we want to explore. In your question, you imply that you want to use each of the unique pair-wise distances. If you prefer some other set for your production use, just enter them here.

radiusCuts <-
  Df_full$Distance %>%
  unique %>%
  sort

Then, we construct a sum command that will sum only paired cities within the radius, setting the names in the process to ease the use of summarise_ in a moment.

forPops <-
  radiusCuts %>%
  setNames(paste("Pop within", ., "km")) %>%
  lapply(function(x){
    paste("sum(Population[Distance <=", x,"])")
  })

Finally, we group_by the Town_From and pass those constructed arguments to the standard evaluation function summarise_ which will create each of the columns in forPops:

Df_full %>%
  group_by(Town_From) %>%
  summarise_(.dots = forPops)

gives:

# A tibble: 4 × 8
  Town_From `Pop within 0 km` `Pop within 5 km` `Pop within 10 km` `Pop within 17 km` `Pop within 18 km` `Pop within 20 km` `Pop within 21 km`
      <chr>             <dbl>             <dbl>              <dbl>              <dbl>              <dbl>              <dbl>              <dbl>
1         A              1000              1500               2300               2300               2500               2500               2500
2         B               800               800               1800               2300               2300               2500               2500
3         C               500              1500               1500               2300               2300               2300               2500
4         D               200               200                200                200               1200               2000               2500

Which should give you all the thresholds you want.

Conditional mutate cumsum dlpyr

Tags:

r

dplyr

conditional

cumsum

JPV

1 Answers

Mark Peterson

Recent Activity

Donate For Us

Conditional mutate cumsum dlpyr

Tags:

r

dplyr

conditional

cumsum

JPV

1 Answers

Mark Peterson

Related questions

Recent Activity

Donate For Us