I need plot a data vector, which follow power law distribution. so if I plot them on log-log axis, and they will be a straight line. However, if I do not explicitly provide "y" parameter, I do not know how to plot. this is code
library("poweRlaw")
library("ggplot2")
xmin = 1; alpha = 1.5
con_rns = rplcon(1000, xmin, alpha)
#convert to data.frame format for ggplot2
df <- data.frame(con_rns =con_rns[con_rns<1000])
#make plot with both axes log scale
ggplot(data = df, aes(x = con_rns))+
geom_point(stat = 'bin', binwidth = 0.1)+
geom_smooth(stat = 'bin',mapping = aes(x=con_rns),method = "lm",se=FALSE)+
scale_x_log10() +
scale_y_log10()
The result is this:
But I want this
I know, I can manually bin data, provide "y" explicitly and then plot the line, like this
ggplot(data = data.frame(a = rnorm(50,0,1),b=5+rnorm(50,2,1)),mapping = aes(x = a,y=b))+
geom_point()+
geom_smooth(method = "lm",se=FALSE)
result:
But I want to know, how can I plot trend line with this code (geom_point(stat = 'bin', binwidth = 0.1)
). It implicitly calculates data bin.
PS:
Well, thanks for Chris's answer. I still have a problem. If I want to plot different group, how can I draw it? The data are df <- data.frame(con_rns =con_rns[con_rns<1000],col=sample(1:3,size = length(con_rns[con_rns<1000]),replace = T))
. How can I plot different color point group and color line group in log-log axis? like this:
To create a Log-Log plot in base R we pass log(data) as data argument instead of data in the plot() function. The log() function converts the data value into its logarithmic value. The log() function by default calculates the natural logarithms.
You may notice that we sometimes reference 'ggplot2' and sometimes 'ggplot'. To clarify, 'ggplot2' is the name of the most recent version of the package. However, any time we call the function itself, it's just called 'ggplot'.
To add an extra point to scatterplot using ggplot2, we can still use geom_point function. We just need to use aes function for quoting with new values for the variables, also we can change the color of this point using colour argument.
One way would be to recover the binned data from the plot using ggplot_build()
first I made the plot without the line of best fit:
p <- ggplot(data = df, aes(x = con_rns))+
geom_point(stat = 'bin', binwidth = 0.1)+
scale_x_log10() +
scale_y_log10()
Then I added the binned data from the plot which can be found with ggplot_build(p)$data
(and reversed the log10 transformation)
p + geom_smooth(data = ggplot_build(p)$data[[1]],
mapping = aes(x=10^x, y= 10^y),method = "lm",se=FALSE)
UPDATE:
The additional problem was how to split the plot by different colour groups. I approached this in the same way but it was necessary for me to create a 'group' aesthetic so this data could be kept in the ggplot_build
data.
library(poweRlaw)
library(ggplot2)
xmin = 1; alpha = 1.5
con_rns = rplcon(1000, xmin, alpha)
#convert to data.frame format for ggplot2
df <- data.frame(con_rns =con_rns[con_rns<1000],col=sample(1:3,size = length(con_rns[con_rns<1000]),replace = T))
p <- ggplot(data = df, aes(x = con_rns))+
geom_point(stat = 'bin', binwidth = 0.1, aes(colour=factor(col), group=factor(col)))+
scale_x_log10() +
scale_y_log10()
p + geom_smooth(data = ggplot_build(p)$data[[1]],
mapping = aes(x=10^x, y= 10^y, colour=factor(group)),method = "lm",se=FALSE)
Note that now we have grouped the data, some of the groups have a count of zero in their bin. This returns a warning when the log10 transformation is applied to zero, giving an infinite value. These points are removed from the plot and ignored in the trend lines.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With