I have towns (from A to D), which have different populations, and are at different distances. The objective is to add up the total population living within the circle of radius (distance XY) where X is a town in the centre of the circle and Y any other town.
In this code:
Df <- structure(list(Town_From = c("A", "A", "A", "B", "B", "C"), Town_To = c("B",
"C", "D", "C", "D", "D"), Distance = c(10, 5, 18, 17, 20, 21)), .Names = c("Town_From",
"Town_To", "Distance"), row.names = c(NA, -6L), class = "data.frame")
Df2 <- structure(list(Town = c("A", "B", "C", "D"), Population = c(1000,
800, 500, 200)), .Names = c("Town", "Population"), row.names = c(NA,
-4L), class = "data.frame")
Df <- Df %>% left_join(Df2,by=c("Town_From"="Town")) %>%
left_join(Df2,by=c("Town_To"="Town"))%>%
group_by(Town_From) %>%
arrange(Distance)
colnames(Df)[4]<-c("pop_TF")
colnames(Df)[5]<-c("pop_TT")
Source: local data frame [6 x 5]
Groups: Town_From [3]
Town_From Town_To Distance pop_TF pop_TT
<chr> <chr> <dbl> <dbl> <dbl>
1 A C 5 1000 500
2 A B 10 1000 800
3 B C 17 800 500
4 A D 18 1000 200
5 B D 20 800 200
6 C D 21 500 200
towns have been organised by (Town_From) and arranged by (distance).
Within the circle of 5km radius (from A to C) live 1000 (in A) + 500 (in C)= 1500 people; within the next circle live 1500 + 800 (in B) =2300. Within the third circle still live 2300 people because towns A, B, C are within the circle radius B to C = 17 km. Within the Circle radius A to D = 18km, live 2300 + 200 (in D)=2500people.
Here is a visualization of the circles in question. In theory, the circles could expand to any arbitrary radius. In practice, I only need to check them at the distances between pairs of towns (places where the counts change).
For this, it is easier if you can put your data into a format where each town is represented on each "end" of the distance (both the to and the from). So, I changed the change you made at the end to Df
to this instead. Note that it uses complete
from tidyr
.
Df_full <-
Df %>%
bind_rows(
select(Df, Town_From = Town_To, Town_To = Town_From, Distance)
) %>%
complete(Town_From, Town_To, fill = list(Distance = 0)) %>%
left_join(Df2, c("Town_To" = "Town"))
This reverses the to-from relationship and appends it to the bottom of the list. Then, it uses complete
to add the town as its own "To" (e.g., From A to A). Finally, it joins the populations in, but they now only need to be added once. Here is the new data:
# A tibble: 16 × 4
Town_From Town_To Distance Population
<chr> <chr> <dbl> <dbl>
1 A A 0 1000
2 A B 10 800
3 A C 5 500
4 A D 18 200
5 B A 10 1000
6 B B 0 800
7 B C 17 500
8 B D 20 200
9 C A 5 1000
10 C B 17 800
11 C C 0 500
12 C D 21 200
13 D A 18 1000
14 D B 20 800
15 D C 21 500
16 D D 0 200
Next, we set the thresholds we want to explore. In your question, you imply that you want to use each of the unique pair-wise distances. If you prefer some other set for your production use, just enter them here.
radiusCuts <-
Df_full$Distance %>%
unique %>%
sort
Then, we construct a sum
command that will sum only paired cities within the radius, setting the names in the process to ease the use of summarise_
in a moment.
forPops <-
radiusCuts %>%
setNames(paste("Pop within", ., "km")) %>%
lapply(function(x){
paste("sum(Population[Distance <=", x,"])")
})
Finally, we group_by
the Town_From
and pass those constructed arguments to the standard evaluation function summarise_
which will create each of the columns in forPops
:
Df_full %>%
group_by(Town_From) %>%
summarise_(.dots = forPops)
gives:
# A tibble: 4 × 8
Town_From `Pop within 0 km` `Pop within 5 km` `Pop within 10 km` `Pop within 17 km` `Pop within 18 km` `Pop within 20 km` `Pop within 21 km`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A 1000 1500 2300 2300 2500 2500 2500
2 B 800 800 1800 2300 2300 2500 2500
3 C 500 1500 1500 2300 2300 2300 2500
4 D 200 200 200 200 1200 2000 2500
Which should give you all the thresholds you want.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With