Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to add trend line in a log-log plot (ggplot2)?

Tags:

r

ggplot2

I need plot a data vector, which follow power law distribution. so if I plot them on log-log axis, and they will be a straight line. However, if I do not explicitly provide "y" parameter, I do not know how to plot. this is code

library("poweRlaw")
library("ggplot2")

xmin = 1; alpha = 1.5
con_rns = rplcon(1000, xmin, alpha)
#convert to data.frame format for ggplot2
df <- data.frame(con_rns =con_rns[con_rns<1000])

#make plot with both axes log scale
ggplot(data = df, aes(x = con_rns))+
  geom_point(stat = 'bin', binwidth = 0.1)+
  geom_smooth(stat = 'bin',mapping = aes(x=con_rns),method = "lm",se=FALSE)+
  scale_x_log10() + 
  scale_y_log10()

The result is this:

enter image description here

But I want this

enter image description here

I know, I can manually bin data, provide "y" explicitly and then plot the line, like this

ggplot(data = data.frame(a = rnorm(50,0,1),b=5+rnorm(50,2,1)),mapping = aes(x = a,y=b))+
  geom_point()+
  geom_smooth(method = "lm",se=FALSE)

result:

enter image description here

But I want to know, how can I plot trend line with this code (geom_point(stat = 'bin', binwidth = 0.1)). It implicitly calculates data bin.

PS: Well, thanks for Chris's answer. I still have a problem. If I want to plot different group, how can I draw it? The data are df <- data.frame(con_rns =con_rns[con_rns<1000],col=sample(1:3,size = length(con_rns[con_rns<1000]),replace = T)) . How can I plot different color point group and color line group in log-log axis? like this:

like image 704
BigMOoO Avatar asked Nov 18 '18 12:11

BigMOoO


People also ask

How do you plot a log graph in R?

To create a Log-Log plot in base R we pass log(data) as data argument instead of data in the plot() function. The log() function converts the data value into its logarithmic value. The log() function by default calculates the natural logarithms.

What is the difference between Ggplot and ggplot2?

You may notice that we sometimes reference 'ggplot2' and sometimes 'ggplot'. To clarify, 'ggplot2' is the name of the most recent version of the package. However, any time we call the function itself, it's just called 'ggplot'.

How do I add a point in ggplot2?

To add an extra point to scatterplot using ggplot2, we can still use geom_point function. We just need to use aes function for quoting with new values for the variables, also we can change the color of this point using colour argument.


1 Answers

One way would be to recover the binned data from the plot using ggplot_build()

first I made the plot without the line of best fit:

p <- ggplot(data = df, aes(x = con_rns))+
  geom_point(stat = 'bin', binwidth = 0.1)+
  scale_x_log10() + 
  scale_y_log10() 

Then I added the binned data from the plot which can be found with ggplot_build(p)$data (and reversed the log10 transformation)

p + geom_smooth(data = ggplot_build(p)$data[[1]], 
              mapping = aes(x=10^x, y= 10^y),method = "lm",se=FALSE)

enter image description here

UPDATE: The additional problem was how to split the plot by different colour groups. I approached this in the same way but it was necessary for me to create a 'group' aesthetic so this data could be kept in the ggplot_build data.

library(poweRlaw)
library(ggplot2)

xmin = 1; alpha = 1.5
con_rns = rplcon(1000, xmin, alpha)
#convert to data.frame format for ggplot2
df <- data.frame(con_rns =con_rns[con_rns<1000],col=sample(1:3,size = length(con_rns[con_rns<1000]),replace = T))

p <- ggplot(data = df, aes(x = con_rns))+
  geom_point(stat = 'bin', binwidth = 0.1, aes(colour=factor(col), group=factor(col)))+
  scale_x_log10() + 
  scale_y_log10() 


p + geom_smooth(data = ggplot_build(p)$data[[1]], 
                mapping = aes(x=10^x, y= 10^y, colour=factor(group)),method = "lm",se=FALSE)

Note that now we have grouped the data, some of the groups have a count of zero in their bin. This returns a warning when the log10 transformation is applied to zero, giving an infinite value. These points are removed from the plot and ignored in the trend lines.

enter image description here

like image 128
Chris Avatar answered Sep 30 '22 23:09

Chris