Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you get geom_map to show all parts of a map?

Tags:

plot

r

ggplot2

I've just started using the geom_map function in ggplot2. After having read the 29 posts I could find on geom_map on here, I'm still running into the same problem.

My data frame is ridiculously big and contains over 2000 rows. It's basically data on a specific gene (TP53) that's been compiled from the WHO.

Please download it from here.

The header looks as follows:

> head(ARCTP53_SOExample)
  Mutation_ID MUT_ID hg18_Chr17_coordinates hg19_Chr17_coordinates ExonIntron Genomic_nt Codon_number
1          16   1789                7519192                7578467     5-exon      12451          155
2          13   1741                7519200                7578475     5-exon      12443          152
3          17   2143                7519131                7578406     5-exon      12512          175
4          14   2143                7519131                7578406     5-exon      12512          175
5          15   2168                7519128                7578403     5-exon      12515          176
6          12   3737                7517845                7577120     8-exon      13798          273
  Description c_description g_description       g_description_hg18 WT_nucleotide Mutant_nucleotide
1         A>G      c.463A>G  g.7578467T>C NC_000017.9:g.7519192T>C           A                   G
2         C>T      c.455C>T  g.7578475G>A NC_000017.9:g.7519200G>A           C                   T
3         G>A      c.524G>A  g.7578406C>T NC_000017.9:g.7519131C>T           G                   A
4         G>A      c.524G>A  g.7578406C>T NC_000017.9:g.7519131C>T           G                   A
5         G>T      c.527G>T  g.7578403C>A NC_000017.9:g.7519128C>A           G                   T
6         G>A      c.818G>A  g.7577120C>T NC_000017.9:g.7517845C>T           G                   A
  Splice_site CpG_site           Type Mut_rate WT_codon Mutant_codon WT_AA Mutant_AA ProtDescription
1          no       no        A:T>G:C    0.170      ACC          GCC   Thr       Ala         p.T155A
2          no      yes G:C>A:T at CpG    1.243      CCG          CTG   Pro       Leu         p.P152L
3          no      yes G:C>A:T at CpG    1.280      CGC          CAC   Arg       His         p.R175H
4          no      yes G:C>A:T at CpG    1.280      CGC          CAC   Arg       His         p.R175H
5          no       no        G:C>T:A    0.054      TGC          TTC   Cys       Phe         p.C176F
6          no      yes G:C>A:T at CpG    1.335      CGT          CAT   Arg       His         p.R273H
  Mut_rateAA   Effect Structural_motif Putative_stop Sample_Name Sample_ID Sample_source Tumor_origin Grade
1      0.170 missense NDBL/beta-sheets             0    CAS91-19        17       surgery      primary      
2      1.243 missense NDBL/beta-sheets             0     CAS91-4        14       surgery      primary      
3      1.280 missense            L2/L3             0    CAS91-13        12       surgery      primary      
4      1.280 missense            L2/L3             0     CAS91-5        15       surgery      primary      
5      0.054 missense            L2/L3             0     CAS91-1        16       surgery      primary      
6      1.335 missense          L1/S/H2             0     CAS91-3        13       surgery      primary      
  Stage TNM p53_IHC KRAS_status Other_mutations Other_associations
1              <NA>        <NA>            <NA>                   
2              <NA>        <NA>            <NA>                   
3              <NA>        <NA>            <NA>                   
4              <NA>        <NA>            <NA>                   
5              <NA>        <NA>            <NA>                   
6              <NA>        <NA>            <NA>                   
                                                                 Add_Info Individual_ID  Sex Age Ethnicity
1 Mutation only present in adjacent dysplastic area (Barrett's esophagus)            17 <NA>  NA          
2 Mutation only present in adjacent dysplastic area (Barrett's esophagus)            14 <NA>  NA          
3 Mutation only present in adjacent dysplastic area (Barrett's esophagus)            12 <NA>  NA          
4 Mutation only present in adjacent dysplastic area (Barrett's esophagus)            15 <NA>  NA          
5                                                                                    16 <NA>  NA          
6      Mutation absent from adjacent dysplasia area (Barrett's esophagus)            13 <NA>  NA          
  Geo_area Country            Development       Population   Region TP53polymorphism Germline_mutation
1              USA More developed regions Northern America Americas                                 NA
2              USA More developed regions Northern America Americas                                 NA
3              USA More developed regions Northern America Americas                                 NA
4              USA More developed regions Northern America Americas                                 NA
5              USA More developed regions Northern America Americas                                 NA
6              USA More developed regions Northern America Americas                                 NA
  Family_history Tobacco Alcohol Exposure Infectious_agent Ref_ID Cross_Ref_ID  PubMed Exclude_analysis
1                   <NA>    <NA>     <NA>             <NA>      4           NA 1868473            False
2                   <NA>    <NA>     <NA>             <NA>      4           NA 1868473            False
3                   <NA>    <NA>     <NA>             <NA>      4           NA 1868473            False
4                   <NA>    <NA>     <NA>             <NA>      4           NA 1868473            False
5                   <NA>    <NA>     <NA>             <NA>      4           NA 1868473            False
6                   <NA>    <NA>     <NA>             <NA>      4           NA 1868473            False
  WGS_WXS
1      No
2      No
3      No
4      No
5      No
6      No

In any case, I want to create a simple world-map which will color the countries, where this mutation has been studied, and if more or less "mutation signatures" come from these countries.

You might understand better what I'm trying to do if you see this:

summary(ARCTP53_SOExample$Country)
Australia                  Brazil                  Canada                   China 
                      1                     127                      76                     519 
       China, Hong-Kong Chinese Taipei (Taiwan)          Czech Republic                   Egypt 
                     52                      36                       9                       9 
                 France                 Germany                   India                    Iran 
                    195                      10                      63                     112 
                Ireland                   Italy                   Japan                   Kenya 
                     25                      30                     414                      11 
           South Africa                   Spain             Switzerland                Thailand 
                     13                       2                      24                      35 
        The Netherlands                      UK                 Uruguay                     USA 
                      6                      17                       6                     189 
                   NA's 
                     30 

So some countries come up multiple times in my data.frame.

So this is what I did in the hope of getting my desired map:

library(ggplot2)
library(maps)
world_map<-map_data("world")
ggplot(ARCTP53_SOExample)+geom_map(map = world_map, aes(map_id = Country,fill = Country),
+ colour = "black") +
+ expand_limits(x = world_map$long, y = world_map$lat)

And this is what I get: This map only contains the countries in my list...

Does anyone have any input on what I'm doing wrong?

Further, what I'd then like to do down the road, is add geom_bar() of the ExonIntron column onto the different countries. But, I'd like to try and generate the right map first?

Thanks a mill.

like image 250
OFish Avatar asked Apr 04 '14 06:04

OFish


People also ask

How do I show the world map in R?

To create a world map using it we will use the geom_map() function of the ggplot2 package of the R Language.

How do I create a map in ggplot2?

The easiest way to create spatial plots with ggplot is to use the geom_sf() function. By default there is no aesthetic mapping, but we can use arguments like fill to easily create choropleth maps.

What GEOM is used to draw geographic borders using Ggplot?

Draw the boundaries ggplot2 contains a geometric object specifically for simple feature objects called geom_sf() . This works reasonably well when you need to draw polygons, like our state boundaries.


1 Answers

The missing countries in the ARC… data frame == missing regions on the map which can be compensated for with base layer made from the world_map data frame:

library(maps)

world_map<-map_data("world")

gg <- ggplot(ARCTP53_SOExample)

# need one layer with ALL THE THINGS (well, all the regions)
gg <- gg + geom_map(dat=world_map, map = world_map, 
                    aes(map_id=region), fill="white", color="black")

# now we can put the layer we really want
gg <- gg + geom_map(map = world_map, 
                    aes(map_id = Country, fill = Country), colour = "black")

gg <- gg + expand_limits(x = world_map$long, y = world_map$lat)
gg <- gg + theme(legend.position="none")
gg

map1

I removed the legend since using a choropleth kinda assumes folks know geography.

NOTE: Using a distinct color per-region (country) is really not a good idea. Since you're really trying to only highlight where the mutation has been studied, a single color should suffice:

gg <- ggplot(ARCTP53_SOExample)
gg <- gg + geom_map(dat=world_map, map = world_map, aes(map_id=region), 
                    fill="white", color="black")
gg <- gg + geom_map(map = world_map, aes(map_id = Country), 
                    fill = "steelblue", colour = "black")
gg <- gg + expand_limits(x = world_map$long, y = world_map$lat)
gg <- gg + theme(legend.position="none")
gg

map2

Since you eventually want to tell the story of ExonIntron, you might want to consider using that as the color of the choropleth. I know nothing of genes, so I don't know if a gradient makes sense or if distinct colors are the way to go. I will posit that the the plethora of distinct colors created by the following code makes me think that you might want to do one gradient scale for intron and one for extron. Again, I'm not a gene person.

gg <- ggplot(ARCTP53_SOExample)
gg <- gg + geom_map(dat=world_map, map = world_map, aes(map_id=region), 
                    fill="white", color="black")
gg <- gg + geom_map(map = world_map, aes(map_id = Country, fill = ExonIntron), 
                    colour = "black")
gg <- gg + expand_limits(x = world_map$long, y = world_map$lat)
gg

map3

Some of the colors are either in really tiny regions, or they are in regions whose names don't match the names in world_map$region. You'll probably want to take a look at that. This:

wm.reg <- unique(as.character(world_map$region))
arc.reg <- unique(as.character(ARCTP53_SOExample$Country))

arc.reg %in% wm.reg
##  [1]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE
## [14]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE FALSE FALSE  TRUE  TRUE

kinda shows some are missing.

You might also want to consider doing the legend differently (i.e. put it on the bottom) if you go with a legend vs build your own table of results.

UPDATE

I almost forgot. Since you (most likely) have no need to Antarctica, you should get rid of it since it's eating up quite a bit of valuable space:

world_map <- subset(world_map, region!="Antarctica")

gg <- ggplot(ARCTP53_SOExample)
gg <- gg + geom_map(dat=world_map, map = world_map, aes(map_id=region), 
                    fill="white", color="black")
gg <- gg + geom_map(map = world_map, aes(map_id = Country, fill = ExonIntron), 
                    colour = "black")
gg <- gg + expand_limits(x = world_map$long, y = world_map$lat)
gg <- gg + theme(legend.position="none")
gg

map4

(NOTE: I got rid of the legend since I really think you should re-think how you want the colors on the map and then use an extra table or plot to act as the legend)


FINAL UPDATE (per OP request in the comments below)

library(ggplot2)
library(maps)
library(plyr)
library(gridExtra)

ARCTP53_SOExample <- read.csv("dat.csv")

# reduce all the distinct exon/introns to just exon or intron

ARCTP53_SOExample$EorI <- factor(ifelse(grepl("exon", 
                                              ARCTP53_SOExample$ExonIntron, 
                                              ignore.case = TRUE), 
                                        "exon", "intron"))

# extract summary data for the two variables we care about for the map

arc.combined <- count(ARCTP53_SOExample, .(Country, EorI))
colnames(arc.combined) <- c("region", "EorI", "ei.ct")

# get total for country (region) and add to the summary info

arc.combined <- merge(arc.combined, count(arc.combined, .(region), wt_var=.(ei.ct)))
colnames(arc.combined) <- c("region", "EorI", "ei.ct", "region.total")

# it wasn't specified if the "EorI" is going to be used on the map so 
# we won't use it below (but we could, now)

# get map and intercourse Antarctica

world_map <- map_data("world")
world_map <- subset(world_map, region!="Antarctica")

# this will show the counts by country with all of the "chart junk" removed
# and the "counts" scaled as a gradient, and with the legend at the top

gg <- ggplot(arc.combined)
gg <- gg + geom_map(dat=world_map, map = world_map, aes(map_id=region), 
                    fill="white", color="#7f7f7f", size=0.25)
gg <- gg + geom_map(map = world_map, aes(map_id = region, fill = region.total), size=0.25)
gg <- gg + scale_fill_gradient(low="#fff7bc", high="#cc4c02", name="Tumor counts")
gg <- gg + expand_limits(x = world_map$long, y = world_map$lat)
gg <- gg + labs(x="", y="", title="Tumor contribution by country")
gg <- gg + theme(panel.grid=element_blank(), panel.border=element_blank())
gg <- gg + theme(axis.ticks=element_blank(), axis.text=element_blank())
gg <- gg + theme(legend.position="top")
gg

mapb

# BUT you might want to show the counts by intron/exon by country
# SO we do a separate map for each factor and combine them
# with some grid magic. This provides more granular control over
# each choropleth (in the event one wanted to tweak one or the other)

# exon

gg <- ggplot(arc.combined[arc.combined$EorI=="exon",])
gg <- gg + geom_map(dat=world_map, map = world_map, aes(map_id=region), 
                    fill="white", color="#7f7f7f", size=0.25)
gg <- gg + geom_map(map = world_map, aes(map_id = region, fill = ei.ct), size=0.25)
gg <- gg + scale_fill_gradient(low="#f7fcb9", high="#238443", name="Tumor counts")
gg <- gg + expand_limits(x = world_map$long, y = world_map$lat)
gg <- gg + labs(x="", y="", title="Tumor contribution by 'exon' & country")
gg <- gg + theme(panel.grid=element_blank(), panel.border=element_blank())
gg <- gg + theme(axis.ticks=element_blank(), axis.text=element_blank())
gg <- gg + theme(legend.position="top")
gg.exon <- gg

# intron

gg <- ggplot(arc.combined[arc.combined$EorI=="intron",])
gg <- gg + geom_map(dat=world_map, map = world_map, aes(map_id=region), 
                    fill="white", color="#7f7f7f", size=0.25)
gg <- gg + geom_map(map = world_map, aes(map_id = region, fill = ei.ct), 
                    colour = "#7f7f7f", size=0.25)
gg <- gg + scale_fill_gradient(low="#ece7f2", high="#0570b0", name="Tumor counts")
gg <- gg + expand_limits(x = world_map$long, y = world_map$lat)
gg <- gg + labs(x="", y="", title="Tumor contribution by 'intron' & country")
gg <- gg + theme(panel.grid=element_blank(), panel.border=element_blank())
gg <- gg + theme(axis.ticks=element_blank(), axis.text=element_blank())
gg <- gg + theme(legend.position="top")
gg.intron <- gg

# use some grid magic to combine them into one plot

grid.arrange(gg.exon, gg.intron, ncol=1)

mapb

like image 114
hrbrmstr Avatar answered Sep 18 '22 12:09

hrbrmstr