I have some yearly football data that I would like to test to see if certain team metrics are repeatable in the next year. My data is in a data.frame and looks something like this:
y2003 y2004 y2005
Team 1 51.95455 51.00000 53.59091
Team 2 54.18182 56.31818 49.09091
Team 3 48.68182 46.86364 49.22727
Team 4 50.86364 47.68182 48.72727
What I want to be able to do is scatterplot this with "Year n" on the x-axis and "Year n+1" on the y-axis. So for example 2003 vs. 2004, 2004 vs. 2005, 2005 vs. 2006 etc. all on the same plot.
I would then like to be able to draw a line of best fit to see how strong the correlation is, whether it is repeatable or not.
What is the best way to do this in R with ggplot2? I can get the initial plot with:
p=ggplot(df,aes(y2003,y2004))
p + geom_point()
Then do I just have to add them all manually? Is there an inbuilt function for this sort of thing? And if I add them all one-by-one how will I get the best fit?
You want a data frame with a row for each team-year combination, containing the data for that year and the next year as well as the team name. You can actually get this without any split-apply-combine manipulation using base R functions:
(to.plot <- data.frame(yearN=unlist(df[-ncol(df)]),
yearNp1=unlist(df[-1]),
team=rep(row.names(df), ncol(df)-1)))
# yearN yearNp1 team
# y20031 51.95455 51.00000 Team1
# y20032 54.18182 56.31818 Team2
# y20033 48.68182 46.86364 Team3
# y20034 50.86364 47.68182 Team4
# y20041 51.00000 53.59091 Team1
# y20042 56.31818 49.09091 Team2
# y20043 46.86364 49.22727 Team3
# y20044 47.68182 48.72727 Team4
Basically this code converts all but the last column of df
into a vector (using unlist
), storing them in variable yearN
. The next year can be obtained by grabbing all but the first column of df
into a vector. Finally, the team name can be obtained as a repeated sequence of the row names of df
.
Getting a line of best fit is a simple linear regression model:
(coefs <- coef(lm(yearNp1~yearN, data=to.plot)))
# (Intercept) yearN
# 28.3611927 0.4308978
Now ggplot
can be used as usual for plotting:
library(ggplot2)
ggplot(to.plot, aes(x=yearN, y=yearNp1, col=team)) + geom_point() +
geom_abline(intercept=coefs[1], slope=coefs[2])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With