Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Conditional mutate cumsum dlpyr

I have towns (from A to D), which have different populations, and are at different distances. The objective is to add up the total population living within the circle of radius (distance XY) where X is a town in the centre of the circle and Y any other town.

In this code:

    Df <- structure(list(Town_From = c("A", "A", "A", "B", "B", "C"), Town_To = c("B", 
    "C", "D", "C", "D", "D"), Distance = c(10, 5, 18, 17, 20, 21)), .Names = c("Town_From", 
    "Town_To", "Distance"), row.names = c(NA, -6L), class = "data.frame")

    Df2 <- structure(list(Town = c("A", "B", "C", "D"), Population = c(1000, 
    800, 500, 200)), .Names = c("Town", "Population"), row.names = c(NA, 
    -4L), class = "data.frame")

    Df <- Df %>% left_join(Df2,by=c("Town_From"="Town")) %>% 
      left_join(Df2,by=c("Town_To"="Town"))%>%
      group_by(Town_From) %>% 
      arrange(Distance)
    colnames(Df)[4]<-c("pop_TF")
    colnames(Df)[5]<-c("pop_TT")
Source: local data frame [6 x 5]
Groups: Town_From [3]

  Town_From Town_To Distance pop_TF pop_TT
      <chr>   <chr>    <dbl>  <dbl>  <dbl>
1         A       C        5   1000    500
2         A       B       10   1000    800
3         B       C       17    800    500
4         A       D       18   1000    200
5         B       D       20    800    200
6         C       D       21    500    200

towns have been organised by (Town_From) and arranged by (distance).

Within the circle of 5km radius (from A to C) live 1000 (in A) + 500 (in C)= 1500 people; within the next circle live 1500 + 800 (in B) =2300. Within the third circle still live 2300 people because towns A, B, C are within the circle radius B to C = 17 km. Within the Circle radius A to D = 18km, live 2300 + 200 (in D)=2500people.

Here is a visualization of the circles in question. In theory, the circles could expand to any arbitrary radius. In practice, I only need to check them at the distances between pairs of towns (places where the counts change).

enter image description here

like image 252
JPV Avatar asked Jan 18 '17 13:01

JPV


1 Answers

For this, it is easier if you can put your data into a format where each town is represented on each "end" of the distance (both the to and the from). So, I changed the change you made at the end to Df to this instead. Note that it uses complete from tidyr.

Df_full <-
  Df %>%
  bind_rows(
    select(Df, Town_From = Town_To, Town_To = Town_From, Distance)
  ) %>%
  complete(Town_From, Town_To, fill = list(Distance = 0)) %>%
  left_join(Df2, c("Town_To" = "Town"))

This reverses the to-from relationship and appends it to the bottom of the list. Then, it uses complete to add the town as its own "To" (e.g., From A to A). Finally, it joins the populations in, but they now only need to be added once. Here is the new data:

# A tibble: 16 × 4
   Town_From Town_To Distance Population
       <chr>   <chr>    <dbl>      <dbl>
1          A       A        0       1000
2          A       B       10        800
3          A       C        5        500
4          A       D       18        200
5          B       A       10       1000
6          B       B        0        800
7          B       C       17        500
8          B       D       20        200
9          C       A        5       1000
10         C       B       17        800
11         C       C        0        500
12         C       D       21        200
13         D       A       18       1000
14         D       B       20        800
15         D       C       21        500
16         D       D        0        200

Next, we set the thresholds we want to explore. In your question, you imply that you want to use each of the unique pair-wise distances. If you prefer some other set for your production use, just enter them here.

radiusCuts <-
  Df_full$Distance %>%
  unique %>%
  sort

Then, we construct a sum command that will sum only paired cities within the radius, setting the names in the process to ease the use of summarise_ in a moment.

forPops <-
  radiusCuts %>%
  setNames(paste("Pop within", ., "km")) %>%
  lapply(function(x){
    paste("sum(Population[Distance <=", x,"])")
  })

Finally, we group_by the Town_From and pass those constructed arguments to the standard evaluation function summarise_ which will create each of the columns in forPops:

Df_full %>%
  group_by(Town_From) %>%
  summarise_(.dots = forPops)

gives:

# A tibble: 4 × 8
  Town_From `Pop within 0 km` `Pop within 5 km` `Pop within 10 km` `Pop within 17 km` `Pop within 18 km` `Pop within 20 km` `Pop within 21 km`
      <chr>             <dbl>             <dbl>              <dbl>              <dbl>              <dbl>              <dbl>              <dbl>
1         A              1000              1500               2300               2300               2500               2500               2500
2         B               800               800               1800               2300               2300               2500               2500
3         C               500              1500               1500               2300               2300               2300               2500
4         D               200               200                200                200               1200               2000               2500

Which should give you all the thresholds you want.

like image 157
Mark Peterson Avatar answered Sep 30 '22 09:09

Mark Peterson