I've just started using the geom_map
function in ggplot2
. After having read the 29 posts I could find on geom_map
on here, I'm still running into the same problem.
My data frame is ridiculously big and contains over 2000 rows. It's basically data on a specific gene (TP53) that's been compiled from the WHO.
Please download it from here.
The header looks as follows:
> head(ARCTP53_SOExample)
Mutation_ID MUT_ID hg18_Chr17_coordinates hg19_Chr17_coordinates ExonIntron Genomic_nt Codon_number
1 16 1789 7519192 7578467 5-exon 12451 155
2 13 1741 7519200 7578475 5-exon 12443 152
3 17 2143 7519131 7578406 5-exon 12512 175
4 14 2143 7519131 7578406 5-exon 12512 175
5 15 2168 7519128 7578403 5-exon 12515 176
6 12 3737 7517845 7577120 8-exon 13798 273
Description c_description g_description g_description_hg18 WT_nucleotide Mutant_nucleotide
1 A>G c.463A>G g.7578467T>C NC_000017.9:g.7519192T>C A G
2 C>T c.455C>T g.7578475G>A NC_000017.9:g.7519200G>A C T
3 G>A c.524G>A g.7578406C>T NC_000017.9:g.7519131C>T G A
4 G>A c.524G>A g.7578406C>T NC_000017.9:g.7519131C>T G A
5 G>T c.527G>T g.7578403C>A NC_000017.9:g.7519128C>A G T
6 G>A c.818G>A g.7577120C>T NC_000017.9:g.7517845C>T G A
Splice_site CpG_site Type Mut_rate WT_codon Mutant_codon WT_AA Mutant_AA ProtDescription
1 no no A:T>G:C 0.170 ACC GCC Thr Ala p.T155A
2 no yes G:C>A:T at CpG 1.243 CCG CTG Pro Leu p.P152L
3 no yes G:C>A:T at CpG 1.280 CGC CAC Arg His p.R175H
4 no yes G:C>A:T at CpG 1.280 CGC CAC Arg His p.R175H
5 no no G:C>T:A 0.054 TGC TTC Cys Phe p.C176F
6 no yes G:C>A:T at CpG 1.335 CGT CAT Arg His p.R273H
Mut_rateAA Effect Structural_motif Putative_stop Sample_Name Sample_ID Sample_source Tumor_origin Grade
1 0.170 missense NDBL/beta-sheets 0 CAS91-19 17 surgery primary
2 1.243 missense NDBL/beta-sheets 0 CAS91-4 14 surgery primary
3 1.280 missense L2/L3 0 CAS91-13 12 surgery primary
4 1.280 missense L2/L3 0 CAS91-5 15 surgery primary
5 0.054 missense L2/L3 0 CAS91-1 16 surgery primary
6 1.335 missense L1/S/H2 0 CAS91-3 13 surgery primary
Stage TNM p53_IHC KRAS_status Other_mutations Other_associations
1 <NA> <NA> <NA>
2 <NA> <NA> <NA>
3 <NA> <NA> <NA>
4 <NA> <NA> <NA>
5 <NA> <NA> <NA>
6 <NA> <NA> <NA>
Add_Info Individual_ID Sex Age Ethnicity
1 Mutation only present in adjacent dysplastic area (Barrett's esophagus) 17 <NA> NA
2 Mutation only present in adjacent dysplastic area (Barrett's esophagus) 14 <NA> NA
3 Mutation only present in adjacent dysplastic area (Barrett's esophagus) 12 <NA> NA
4 Mutation only present in adjacent dysplastic area (Barrett's esophagus) 15 <NA> NA
5 16 <NA> NA
6 Mutation absent from adjacent dysplasia area (Barrett's esophagus) 13 <NA> NA
Geo_area Country Development Population Region TP53polymorphism Germline_mutation
1 USA More developed regions Northern America Americas NA
2 USA More developed regions Northern America Americas NA
3 USA More developed regions Northern America Americas NA
4 USA More developed regions Northern America Americas NA
5 USA More developed regions Northern America Americas NA
6 USA More developed regions Northern America Americas NA
Family_history Tobacco Alcohol Exposure Infectious_agent Ref_ID Cross_Ref_ID PubMed Exclude_analysis
1 <NA> <NA> <NA> <NA> 4 NA 1868473 False
2 <NA> <NA> <NA> <NA> 4 NA 1868473 False
3 <NA> <NA> <NA> <NA> 4 NA 1868473 False
4 <NA> <NA> <NA> <NA> 4 NA 1868473 False
5 <NA> <NA> <NA> <NA> 4 NA 1868473 False
6 <NA> <NA> <NA> <NA> 4 NA 1868473 False
WGS_WXS
1 No
2 No
3 No
4 No
5 No
6 No
In any case, I want to create a simple world-map which will color the countries, where this mutation has been studied, and if more or less "mutation signatures" come from these countries.
You might understand better what I'm trying to do if you see this:
summary(ARCTP53_SOExample$Country)
Australia Brazil Canada China
1 127 76 519
China, Hong-Kong Chinese Taipei (Taiwan) Czech Republic Egypt
52 36 9 9
France Germany India Iran
195 10 63 112
Ireland Italy Japan Kenya
25 30 414 11
South Africa Spain Switzerland Thailand
13 2 24 35
The Netherlands UK Uruguay USA
6 17 6 189
NA's
30
So some countries come up multiple times in my data.frame
.
So this is what I did in the hope of getting my desired map:
library(ggplot2)
library(maps)
world_map<-map_data("world")
ggplot(ARCTP53_SOExample)+geom_map(map = world_map, aes(map_id = Country,fill = Country),
+ colour = "black") +
+ expand_limits(x = world_map$long, y = world_map$lat)
And this is what I get:
Does anyone have any input on what I'm doing wrong?
Further, what I'd then like to do down the road, is add geom_bar()
of the ExonIntron
column onto the different countries. But, I'd like to try and generate the right map first?
Thanks a mill.
To create a world map using it we will use the geom_map() function of the ggplot2 package of the R Language.
The easiest way to create spatial plots with ggplot is to use the geom_sf() function. By default there is no aesthetic mapping, but we can use arguments like fill to easily create choropleth maps.
Draw the boundaries ggplot2 contains a geometric object specifically for simple feature objects called geom_sf() . This works reasonably well when you need to draw polygons, like our state boundaries.
The missing countries in the ARC…
data frame == missing regions on the map which can be compensated for with base layer made from the world_map
data frame:
library(maps)
world_map<-map_data("world")
gg <- ggplot(ARCTP53_SOExample)
# need one layer with ALL THE THINGS (well, all the regions)
gg <- gg + geom_map(dat=world_map, map = world_map,
aes(map_id=region), fill="white", color="black")
# now we can put the layer we really want
gg <- gg + geom_map(map = world_map,
aes(map_id = Country, fill = Country), colour = "black")
gg <- gg + expand_limits(x = world_map$long, y = world_map$lat)
gg <- gg + theme(legend.position="none")
gg
I removed the legend since using a choropleth kinda assumes folks know geography.
NOTE: Using a distinct color per-region (country) is really not a good idea. Since you're really trying to only highlight where the mutation has been studied, a single color should suffice:
gg <- ggplot(ARCTP53_SOExample)
gg <- gg + geom_map(dat=world_map, map = world_map, aes(map_id=region),
fill="white", color="black")
gg <- gg + geom_map(map = world_map, aes(map_id = Country),
fill = "steelblue", colour = "black")
gg <- gg + expand_limits(x = world_map$long, y = world_map$lat)
gg <- gg + theme(legend.position="none")
gg
Since you eventually want to tell the story of ExonIntron
, you might want to consider using that as the color of the choropleth. I know nothing of genes, so I don't know if a gradient makes sense or if distinct colors are the way to go. I will posit that the the plethora of distinct colors created by the following code makes me think that you might want to do one gradient scale for intron
and one for extron
. Again, I'm not a gene person.
gg <- ggplot(ARCTP53_SOExample)
gg <- gg + geom_map(dat=world_map, map = world_map, aes(map_id=region),
fill="white", color="black")
gg <- gg + geom_map(map = world_map, aes(map_id = Country, fill = ExonIntron),
colour = "black")
gg <- gg + expand_limits(x = world_map$long, y = world_map$lat)
gg
Some of the colors are either in really tiny regions, or they are in regions whose names don't match the names in world_map$region
. You'll probably want to take a look at that. This:
wm.reg <- unique(as.character(world_map$region))
arc.reg <- unique(as.character(ARCTP53_SOExample$Country))
arc.reg %in% wm.reg
## [1] TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
## [14] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE FALSE FALSE TRUE TRUE
kinda shows some are missing.
You might also want to consider doing the legend differently (i.e. put it on the bottom) if you go with a legend vs build your own table of results.
UPDATE
I almost forgot. Since you (most likely) have no need to Antarctica, you should get rid of it since it's eating up quite a bit of valuable space:
world_map <- subset(world_map, region!="Antarctica")
gg <- ggplot(ARCTP53_SOExample)
gg <- gg + geom_map(dat=world_map, map = world_map, aes(map_id=region),
fill="white", color="black")
gg <- gg + geom_map(map = world_map, aes(map_id = Country, fill = ExonIntron),
colour = "black")
gg <- gg + expand_limits(x = world_map$long, y = world_map$lat)
gg <- gg + theme(legend.position="none")
gg
(NOTE: I got rid of the legend since I really think you should re-think how you want the colors on the map and then use an extra table or plot to act as the legend)
FINAL UPDATE (per OP request in the comments below)
library(ggplot2)
library(maps)
library(plyr)
library(gridExtra)
ARCTP53_SOExample <- read.csv("dat.csv")
# reduce all the distinct exon/introns to just exon or intron
ARCTP53_SOExample$EorI <- factor(ifelse(grepl("exon",
ARCTP53_SOExample$ExonIntron,
ignore.case = TRUE),
"exon", "intron"))
# extract summary data for the two variables we care about for the map
arc.combined <- count(ARCTP53_SOExample, .(Country, EorI))
colnames(arc.combined) <- c("region", "EorI", "ei.ct")
# get total for country (region) and add to the summary info
arc.combined <- merge(arc.combined, count(arc.combined, .(region), wt_var=.(ei.ct)))
colnames(arc.combined) <- c("region", "EorI", "ei.ct", "region.total")
# it wasn't specified if the "EorI" is going to be used on the map so
# we won't use it below (but we could, now)
# get map and intercourse Antarctica
world_map <- map_data("world")
world_map <- subset(world_map, region!="Antarctica")
# this will show the counts by country with all of the "chart junk" removed
# and the "counts" scaled as a gradient, and with the legend at the top
gg <- ggplot(arc.combined)
gg <- gg + geom_map(dat=world_map, map = world_map, aes(map_id=region),
fill="white", color="#7f7f7f", size=0.25)
gg <- gg + geom_map(map = world_map, aes(map_id = region, fill = region.total), size=0.25)
gg <- gg + scale_fill_gradient(low="#fff7bc", high="#cc4c02", name="Tumor counts")
gg <- gg + expand_limits(x = world_map$long, y = world_map$lat)
gg <- gg + labs(x="", y="", title="Tumor contribution by country")
gg <- gg + theme(panel.grid=element_blank(), panel.border=element_blank())
gg <- gg + theme(axis.ticks=element_blank(), axis.text=element_blank())
gg <- gg + theme(legend.position="top")
gg
# BUT you might want to show the counts by intron/exon by country
# SO we do a separate map for each factor and combine them
# with some grid magic. This provides more granular control over
# each choropleth (in the event one wanted to tweak one or the other)
# exon
gg <- ggplot(arc.combined[arc.combined$EorI=="exon",])
gg <- gg + geom_map(dat=world_map, map = world_map, aes(map_id=region),
fill="white", color="#7f7f7f", size=0.25)
gg <- gg + geom_map(map = world_map, aes(map_id = region, fill = ei.ct), size=0.25)
gg <- gg + scale_fill_gradient(low="#f7fcb9", high="#238443", name="Tumor counts")
gg <- gg + expand_limits(x = world_map$long, y = world_map$lat)
gg <- gg + labs(x="", y="", title="Tumor contribution by 'exon' & country")
gg <- gg + theme(panel.grid=element_blank(), panel.border=element_blank())
gg <- gg + theme(axis.ticks=element_blank(), axis.text=element_blank())
gg <- gg + theme(legend.position="top")
gg.exon <- gg
# intron
gg <- ggplot(arc.combined[arc.combined$EorI=="intron",])
gg <- gg + geom_map(dat=world_map, map = world_map, aes(map_id=region),
fill="white", color="#7f7f7f", size=0.25)
gg <- gg + geom_map(map = world_map, aes(map_id = region, fill = ei.ct),
colour = "#7f7f7f", size=0.25)
gg <- gg + scale_fill_gradient(low="#ece7f2", high="#0570b0", name="Tumor counts")
gg <- gg + expand_limits(x = world_map$long, y = world_map$lat)
gg <- gg + labs(x="", y="", title="Tumor contribution by 'intron' & country")
gg <- gg + theme(panel.grid=element_blank(), panel.border=element_blank())
gg <- gg + theme(axis.ticks=element_blank(), axis.text=element_blank())
gg <- gg + theme(legend.position="top")
gg.intron <- gg
# use some grid magic to combine them into one plot
grid.arrange(gg.exon, gg.intron, ncol=1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With