I have produced a line graph something that looks like this

I have the data set of 50 countries and its GDP for last 10 years.
Sample data:
Country variable value
China Y2007 3.55218e+12
USA Y2007 1.45000e+13
Japan Y2007 4.51526e+12
UK Y2007 3.06301e+12
Russia Y2007 1.29971e+12
Canada Y2007 1.46498e+12
Germany Y2007 3.43995e+12
India Y2007 1.20107e+12
France Y2007 2.66311e+12
SKorea Y2007 1.12268e+12
I generated the line graph using the code
GDP_lineplot = ggplot(data=GDP_linechart, aes(x=variable,y=value)) +
geom_line() +
scale_y_continuous(name = "GDP(USD in Trillions)",
breaks = c(0.0e+00,5.0e+12,1.0e+13,1.5e+13),
labels = c(0,5,10,15)) +
scale_x_discrete(name = "Years", labels = c(2007,"",2009,"",2011,"",2013,"",2015))
The idea is to make the graph look like this.
I tried adding
group=country, color = country
It outputs coloring all the countries.
How can I color the countries with top 4 and the rest?
PS: I am still naive with R.
By plotting subsets, the other groups aren't included in the colour legend on the right. The alternative approach below manipulates factor levels and uses a customized color scale to overcome this.
It is assumed that GDP_long contains the data in long format. This is in line with the data shown by the OP (GDP_lineplot, but see Data section below for differences). To manipulate factor levels, the forcatspackage is used (and data.table).
library(data.table)
library(forcats)
# coerce to data.table, reorder factors by values in last = most actual year
setDT(GDP_long)[, Country := fct_reorder(Country, -value, last)]
# create new factor which collapses all countries to "Other" except the top 4 countries
GDP_long[, top_country := fct_other(Country, keep = head(levels(Country), 4))]
library(ggplot2)
ggplot(GDP_long, aes(Year, value/1e12, group = Country, colour = top_country)) +
geom_point() + geom_line(size = 1) + theme_bw() + ylab("GDP(USD in Trillions)") +
scale_colour_manual(name = "Country",
values = c("green3", "orange", "blue", "red", "grey"))

The chart is now quite similar to the expected result. The lines of the top 4 countries are displayed in different colours while the other countries are displayed in grey but do appear in the colour legend to the right.
Note that the groupaesthetic is still needed so that a single line is plotted for each country while colour is controlled by the levels of top_country.
The data set is too large to be reproduced here (even with dput()). The structure
str(GDP_long)
'data.frame': 1763 obs. of 3 variables:
$ Country: chr "Afghanistan" "Albania" "Algeria" "Andorra" ...
$ Year : int 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...
$ value : num 9.84e+09 1.07e+10 1.35e+11 4.01e+09 6.04e+10 ...
is similar to OP's data with the exception that the variable column already is converted to an integer column year. This will give a nicely formatted x-axis without additional effort.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With