Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ggplot2; single regression line when colour is coded for by a variable?

I am trying to create a scatterplot in ggplot2 with one regression line even though colour is dependent on the 'Survey Type' variable. I would ideally also like to specify which survey type is which colour (community = red, subnational = green, national = blue).

This is the code I'm running which currently gives me 3 separate regression lines, one for each survey type.

ggplot(data=data.male,aes(x=mid_year, y=mean_tc, colour =condition)) +
geom_point(shape=1) + 
geom_smooth(method=lm, data=data.male, na.rm = TRUE, fullrange= TRUE) 

The condition is:

condition <- (data.male$survey_type)

Even if I move the colour aesthetic to the geom_point function it doesn't work as it gives me an error saying community is not a valid colour name?

My actual data file is really big so I'll just give a small sample here:

data.male dataset:

mid_year mean_tc survey_type
2000     4       Community
2001     5       National
2002     5.1     Subnational
2003     4.3     National
2004     4.5     Community
2005     5.2     Subnational
2006     4.4     National
like image 869
Nadiah Avatar asked May 20 '16 14:05

Nadiah


People also ask

How do I change the color of the regression line in ggplot2?

By default, the regression line is blue. To change the color we have to use the keyword color inside the geom_smooth( ) function.

How do I add a regression line in ggplot2?

Adding a regression line on a ggplot You can use geom_smooth() with method = "lm" . This will automatically add a regression line for y ~ x to the plot.

How do you add a regression line in R studio?

A regression line will be added on the plot using the function abline(), which takes the output of lm() as an argument. You can also add a smoothing line using the function loess().


1 Answers

data.male <- read.table(header=TRUE,text="
 mid_year mean_tc survey_type
 2000     4       Community
 2001     5       National
 2002     5.1     Subnational
 2003     4.3     National
 2004     4.5     Community
 2005     5.2     Subnational
 2006     4.4     National")
  • Use aes(group=1) in the geom_smooth() specification to ignore the grouping by survey type induced by assigning the colour mapping to survey type. (Alternatively, you can put the colour mapping into geom_point() rather than the overall ggplot() specification.)
  • If you want to specify colour you need to give it as the name of a variable in your data frame (i.e., survey_type); if you want to change the name in the legend to condition you can do that in the colour scale specification (example below).
library(ggplot2); theme_set(theme_bw())
ggplot(data=data.male,aes(x=mid_year, y=mean_tc, colour=survey_type)) +
   geom_point(shape=1) +
   ## use aes(group=1) for single regression line across groups;
   ##   don't need to re-specify data argument
   ##  set colour to black (from default blue) to avoid confusion
   ##  with national (blue) points
   geom_smooth(method=lm, na.rm = TRUE, fullrange= TRUE,
               aes(group=1),colour="black")+
   scale_colour_manual(name="condition",
       values=c("red","blue","green"))
       ## in factor level order; probably better to
       ## specify 'breaks' explicitly ...
  • Out of courtesy to colour-blind people I would suggest not using primary red/green/blue as your colour specifications (try scale_colour_brewer(palette="Dark1") instead).

enter image description here

like image 188
Ben Bolker Avatar answered Oct 16 '22 09:10

Ben Bolker