I have data as follows, for which I run ggplot
code below:
data <- structure(list(country_mean_rep = structure(c(73.6995708154506,
93.5501285347044, 85.1529051987768, 91.1017369727047, 79.5562130177515,
84.6751054852321, 89.8, 86.8826405867971, 94.2247191011236, 70.2321428571429,
88.4107142857143), label = "label", format.stata = "%9.2f"),
country_mean_crime = c(0.0944206008583691, 0.0565552699228792,
0.0336391437308868, 0.205955334987593, 0.130177514792899,
0.282700421940928, 0.220512820512821, 0.415647921760391,
0.387640449438202, 0.200892857142857, 0.292207792207792),
country_name = structure(c(1L, 2L, 3L, 4L, 5L, 7L, 11L, 12L,
14L, 16L, 20L), .Label = c("Albania", "Armenia", "Azerbaijan",
"Belarus", "Bosnia and Herzegovina", "Brazil", "Bulgaria",
"Cambodia", "Chile", "CostaRica", "Croatia", "Czech", "Ecuador",
"Estonia", "FYROM", "Georgia", "Germany", "Greece", "Guyana",
"Hungary", "Ireland", "Kazakhstan", "Kenya", "Kyrgyzstan",
"Latvia", "Lithuania", "Malawi", "Mali", "Moldova", "Philippines",
"Poland", "Portugal", "Romania", "Russia", "Senegal", "Serbia&Montenegro",
"Slovakia", "Slovenia", "South Africa", "South Korea", "Spain",
"SriLanka", "Tajikistan", "Turkey", "Ukraine", "Uzbekistan",
"Vietnam"), class = "factor")), row.names = c(NA, -11L), class = c("data.table",
"data.frame"))
# On which I like to run the following code:
ggplot(data, aes(x=country_mean_rep, y=country_mean_crime)) +
geom_point() +
geom_smooth(aes(colour="linear", fill="linear"),
method="lm",
formula=y ~ x, ) +
geom_smooth(aes(colour="quadratic", fill="quadratic"),
method="lm",
formula=y ~ x + I(x^2)) +
geom_smooth(aes(colour="cubic", fill="cubic"),
method="lm",
formula=y ~ x + I(x^2) + I(x^3)) +
labs(colour="Functional Form", fill="Functional Form") +
geom_text(aes(label=country_name), nudge_y=0.02) +
theme_bw()
Now let's say that the Czech Republic is an outlier, which I want to remove for the fits I am doing (especially the linear one). Please note that I understand there is nothing wrong with the Czech Republic in the example, I need to know this for a proper outlier in my actual data.
Is there some way of excluding it only from the fit, while keeping the dot in the plot?
One way to do it would be to include different data plots:
ggplot(subset(data, country_name != 'Czech'), aes(x=country_mean_rep, y=country_mean_crime)) +
geom_smooth(aes(colour="linear", fill="linear"),
method="lm",
formula=y ~ x, ) +
geom_smooth(aes(colour="quadratic", fill="quadratic"),
method="lm",
formula=y ~ x + I(x^2)) +
geom_smooth(aes(colour="cubic", fill="cubic"),
method="lm",
formula=y ~ x + I(x^2) + I(x^3)) +
labs(colour="Functional Form", fill="Functional Form") +
geom_point(data = data, inherit.aes = FALSE, aes(x = country_mean_rep, y = country_mean_crime)) +
geom_text(data = data, aes(label=country_name, x = country_mean_rep, y = country_mean_crime), inherit.aes = FALSE, nudge_y=0.02) +
theme_bw()
In this case, the 3 linear models use the subsetted data whereas the calls to geom_point
and geom_text
do not inherit the original aestetics.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With