I would like to create population pyramids in R. I know that there are many examples here on StackOverflow, but I would like to create one that also include population projections, i.e. with bars for each age group by sex and lines by sex and age group for the projections.
You can see an example here: http://geographyblog.eu/wp/the-worlds-population-pyramid-is-changing-shape/
If there are some suggestions on how to better illustrate this (e.g. with smoothed lines), they are also welcome, but I would like to indicate both the current situation and the projection. Example data can be found on UN's website: http://esa.un.org/wpp/population-pyramids/population-pyramids_absolute.htm
Any help would be greatly appreciated.
To create a population pyramid, we use the coord_flip() function along with the geom_bar() function to create a horizontal bar plot, then we make the value of the male population negative using the mutate function thus creating the male population bars on the left side and female population bar on the right side giving ...
There are generally three types of population pyramids created from age-sex distributions-- expansive, constrictive and stationary.
Population Pyramid and Demographic Transition The stages of demographic transition are – (i) high stationary stage; (ii) early expanding stage; (iii) late expanding stage; (iv) low stationary stage and (v) declining or negative growth rate stage.
You can cook up something easily using answers from this question (here I used @timriffle's answer as well as mine).
First some data (from the link you provided):
wp <- structure(list(M.1990 = c(325814, 295272, 269351, 265163, 249651, 220027, 196523, 178295, 141789, 115097, 106579, 91763, 77150, 56845, 38053, 25716, 19442), M.2000 = c(319675, 317296, 317072, 290827, 262992, 256378, 241401, 212924, 188905, 169133, 131813, 103162, 90921, 72231, 53449, 32707, 25868), M.2010 = c(328759, 315119, 311456, 312831, 311077, 284258, 255596, 248575, 232217, 202633, 176241, 153494, 114194, 83129, 65266, 43761, 39223), F.1990 = c(308121, 281322, 257432, 254065, 238856, 211943, 188433, 170937, 138358, 112931, 106510, 93425, 82667, 67057, 47679, 37435, 36724), F.2000 = c(298455, 297012, 299757, 277706, 252924, 248127, 233583, 207518, 183646, 165444, 132307, 105429, 96681, 80227, 64956, 45832, 46413), F.2010 = c(307079, 293664, 290598, 293313, 295739, 273379, 247383, 241938, 226914, 201142, 176440, 156283, 121200, 92071, 77990, 56895, 66029)), .Names = c("M.1990", "M.2000", "M.2010", "F.1990", "F.2000", "F.2010"), row.names = c("0-4", "5-9", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40-44", "45-49", "50-54", "55-59", "60-64", "65-69", "70-74", "75-79", "80+"), class = "data.frame")
wp
M.1990 M.2000 M.2010 F.1990 F.2000 F.2010
0-4 325814 319675 328759 308121 298455 307079
5-9 295272 317296 315119 281322 297012 293664
10-14 269351 317072 311456 257432 299757 290598
15-19 265163 290827 312831 254065 277706 293313
20-24 249651 262992 311077 238856 252924 295739
25-29 220027 256378 284258 211943 248127 273379
30-34 196523 241401 255596 188433 233583 247383
35-39 178295 212924 248575 170937 207518 241938
40-44 141789 188905 232217 138358 183646 226914
45-49 115097 169133 202633 112931 165444 201142
50-54 106579 131813 176241 106510 132307 176440
55-59 91763 103162 153494 93425 105429 156283
60-64 77150 90921 114194 82667 96681 121200
65-69 56845 72231 83129 67057 80227 92071
70-74 38053 53449 65266 47679 64956 77990
75-79 25716 32707 43761 37435 45832 56895
80+ 19442 25868 39223 36724 46413 66029
xrange <- range(c(0,wp))
yrange <- range(c(0,nrow(wp)))
And then the plotting part (in two panels):
par(mfcol=c(1,2))
par(mar=c(5,4,4,0))
plot(NA,type="n", main="Men", xlab="", ylab="", xaxs="i",
xlim=rev(xrange), ylim=yrange, axes=FALSE, yaxs="i")
rect(xrange[1],yrange[1],xrange[2],yrange[2], col="cadetblue")
abline(v=seq(0,xrange[2],by=1e5), col="white")
# All years with bars you want to represent filled
# should be entered in reverse order
polygon(c(0,rep(wp$M.2000,each=2), 0), c(0,0,rep(1:nrow(wp),each=2)),
col="lightblue",border="lightblue")
polygon(c(0,rep(wp$M.1990,each=2), 0), c(0,0,rep(1:nrow(wp),each=2)),
col="darkblue",border="darkblue")
# And those you want with just a border, afterwards:
polygon(c(0,rep(wp$M.2010,each=2), 0), c(0,0,rep(1:nrow(wp),each=2)),
col=NA,border="darkred",lwd=2)
axis(1, at=c(0,1e5,2e5,3e5), labels=format(c(0,1e5,2e5,3e5),scientific=FALSE))
axis(2, at=1:nrow(wp)-0.5,labels=row.names(wp),las=2)
box()
par(mar=c(5,0,4,4))
plot(NA,type="n", main="Women", xlab="", ylab="", xaxs="i",
xlim=xrange, ylim=yrange, axes=FALSE, yaxs="i")
rect(xrange[1],yrange[1],xrange[2],yrange[2], col="cadetblue")
abline(v=seq(0,xrange[2],by=1e5), col="white")
polygon(c(0,rep(wp$F.2000,each=2), 0), c(0,0,rep(1:nrow(wp),each=2)),
col="lightblue",border="lightblue")
polygon(c(0,rep(wp$F.1990,each=2), 0), c(0,0,rep(1:nrow(wp),each=2)),
col="darkblue",border="darkblue")
polygon(c(0,rep(wp$F.2010,each=2), 0), c(0,0,rep(1:nrow(wp),each=2)),
col=NA,border="darkred",lwd=2)
axis(1, at=c(0,1e5,2e5,3e5), labels=format(c(0,1e5,2e5,3e5),scientific=FALSE))
axis(4, at=1:nrow(wp)-0.5,labels=row.names(wp),las=2)
box()
To circumvent the issue highlighted by @Spacedman in his comment, you can use an alpha for some of the years.
library(scales)
[...]
polygon(c(0,rep(wp$M.1990,each=2), 0), c(0,0,rep(1:nrow(wp),each=2)),
col=alpha("darkblue",0.4),border="darkblue")
[...]
polygon(c(0,rep(wp$F.1990,each=2), 0), c(0,0,rep(1:nrow(wp),each=2)),
col=alpha("darkblue",0.4),border="darkblue")
[...]
Perhaps a little less ad-hoc method uses ggplot2
and geom_bar
and geom_step
.
The data can be extracted from the wpp2015
package (or wpp2012
, wpp2010
or wpp2008
if you prefer older revisions).
library("dplyr")
library("tidyr")
library("wpp2015")
#load data in wpp2015
data(popF)
data(popM)
data(popFprojMed)
data(popMprojMed)
#combine past and future female population
df0 <- popF %>%
left_join(popFprojMed) %>%
mutate(gender = "female")
#combine past and future male population, add on female population
df1 <- popM %>%
left_join(popMprojMed) %>%
mutate(gender = "male") %>%
bind_rows(df0) %>%
mutate(age = factor(age, levels = unique(age)))
#stack data for ggplot, filter World population and required years
df2 <- df1 %>%
gather(key = year, value = pop, -country, -country_code, -age, -gender) %>%
mutate(pop = pop/1e03) %>%
filter(country == "World", year %in% c(1950, 2000, 2050, 2100))
#add extra rows and numeric age variable for geom_step used for 2100
df2 <- df2 %>%
mutate(ageno = as.numeric(age) - 0.5)
df2 <- df2 %>%
bind_rows(df2 %>% filter(year==2100, age=="100+") %>% mutate(ageno = ageno + 1))
df2
# Source: local data frame [170 x 7]
#
# country country_code age gender year pop ageno
# (fctr) (int) (fctr) (chr) (chr) (dbl) (dbl)
# 1 World 900 0-4 male 1950 171.85124 0.5
# 2 World 900 5-9 male 1950 137.99242 1.5
# 3 World 900 10-14 male 1950 133.27428 2.5
# 4 World 900 15-19 male 1950 121.69274 3.5
# 5 World 900 20-24 male 1950 112.39438 4.5
# 6 World 900 25-29 male 1950 96.59408 5.5
# 7 World 900 30-34 male 1950 83.38595 6.5
# 8 World 900 35-39 male 1950 80.59671 7.5
# 9 World 900 40-44 male 1950 73.08263 8.5
# 10 World 900 45-49 male 1950 63.13648 9.5
# .. ... ... ... ... ... ... ...
With standard ggplot
functions you can get something similar, adapting from the answer here:
library("ggplot2")
ggplot(data = df2, aes(x = age, y = pop, fill = year)) +
#bars for all but 2100
geom_bar(data = df2 %>% filter(gender == "female", year != 2100) %>% arrange(rev(year)),
stat = "identity",
position = "identity") +
geom_bar(data = df2 %>% filter(gender == "male", year != 2100) %>% arrange(rev(year)),
stat = "identity",
position = "identity",
mapping = aes(y = -pop)) +
#steps for 2100
geom_step(data = df2 %>% filter(gender == "female", year == 2100),
aes(x = ageno)) +
geom_step(data = df2 %>% filter(gender == "male", year == 2100),
aes(x = ageno, y = -pop)) +
coord_flip() +
scale_y_continuous(labels = abs)
Note: you need to do arrange(rev(year))
as the bars are overlays.
With the ggthemes
package you can get pretty close to the original Economist plot.
library("ggthemes")
ggplot(data = df2, aes(x = age, y = pop, fill = year)) +
#bars for all but 2100
geom_bar(data = df2 %>% filter(gender == "female", year != 2100) %>% arrange(rev(year)),
stat = "identity",
position = "identity") +
geom_bar(data = df2 %>% filter(gender == "male", year != 2100) %>% arrange(rev(year)),
stat = "identity",
position = "identity",
mapping = aes(y = -pop)) +
#steps for 2100
geom_step(data = df2 %>% filter(gender == "female", year == 2100),
aes(x = ageno), size = 1) +
geom_step(data = df2 %>% filter(gender == "male", year == 2100),
aes(x = ageno, y = -pop), size = 1) +
coord_flip() +
#extra style shazzaz
scale_y_continuous(labels = abs, limits = c(-400, 400), breaks = seq(-400, 400, 100)) +
geom_hline(yintercept = 0) +
theme_economist(horizontal = FALSE) +
scale_fill_economist() +
labs(fill = "", x = "", y = "")
(I am sure you can get even closer, but I have stopped here for now).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With