Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stochastic gradient descent from gradient descent implementation in R

I have a working implementation of multivariable linear regression using gradient descent in R. I'd like to see if I can use what I have to run a stochastic gradient descent. I'm not sure if this is really inefficient or not. For example, for each value of α I want to perform 500 SGD iterations and be able to specify the number of randomly picked samples in each iteration. It would be nice to do this so I could see how the number of samples influences the results. I'm having trouble through with the mini-batching and I want to be able to easily plot the results.

This is what I have so far:

 # Read and process the datasets

# download the files from GitHub
download.file("https://raw.githubusercontent.com/dbouquin/IS_605/master/sgd_ex_data/ex3x.dat", "ex3x.dat", method="curl")
x <- read.table('ex3x.dat')

# we can standardize the x vaules using scale()
x <- scale(x)

download.file("https://raw.githubusercontent.com/dbouquin/IS_605/master/sgd_ex_data/ex3y.dat", "ex3y.dat", method="curl")
y <- read.table('ex3y.dat')

# combine the datasets
data3 <- cbind(x,y)
colnames(data3) <- c("area_sqft", "bedrooms","price")
str(data3)

head(data3)

################ Regular Gradient Descent
# http://www.r-bloggers.com/linear-regression-by-gradient-descent/

# vector populated with 1s for the intercept coefficient
x1 <- rep(1, length(data3$area_sqft))

# appends to dfs
# create x-matrix of independent variables
x <- as.matrix(cbind(x1,x))
# create y-matrix of dependent variables
y <- as.matrix(y)
L <- length(y)

# cost gradient function: independent variables and values of thetas
cost <- function(x,y,theta){
  gradient <- (1/L)* (t(x) %*% ((x%*%t(theta)) - y))
  return(t(gradient)) 
}

# GD simultaneous update algorithm
# https://www.coursera.org/learn/machine-learning/lecture/8SpIM/gradient-descent
GD <- function(x, alpha){
      theta <- matrix(c(0,0,0), nrow=1) 
  for (i in 1:500) {
       theta <- theta - alpha*cost(x,y,theta)  
       theta_r <- rbind(theta_r,theta)    
  }
return(theta_r)
}

# gradient descent α = (0.001, 0.01, 0.1, 1.0) - defined for 500 iterations

alphas <- c(0.001,0.01,0.1,1.0)

# Plot price, area in square feet, and the number of bedrooms

# create empty vector theta_r
theta_r<-c()

for(i in 1:length(alphas)) {

 result <- GD(x, alphas[i])

 # red = price 
 # blue = sq ft 
 # green = bedrooms
 plot(result[,1],ylim=c(min(result),max(result)),col="#CC6666",ylab="Value",lwd=0.35,
      xlab=paste("alpha=", alphas[i]),xaxt="n") #suppress auto x-axis title
      lines(result[,2],type="b",col="#0072B2",lwd=0.35)
      lines(result[,3],type="b",col="#66CC99",lwd=0.35)
}

Is it more practical to find a way to use sgd()? I can't seem to figure out how to have the level of control I'm looking for with the sgd package

like image 638
Daina Avatar asked May 27 '16 13:05

Daina


People also ask

Is Stochastic Gradient Descent same as gradient descent?

The only difference comes while iterating. In Gradient Descent, we consider all the points in calculating loss and derivative, while in Stochastic gradient descent, we use single point in loss function and its derivative randomly. Check out these two articles, both are inter-related and well explained.

Is Stochastic Gradient Descent a descent method?

Stochastic gradient descent is an optimization algorithm often used in machine learning applications to find the model parameters that correspond to the best fit between predicted and actual outputs. It's an inexact but powerful technique. Stochastic gradient descent is widely used in machine learning applications.


1 Answers

Sticking with what you have now

## all of this is the same

download.file("https://raw.githubusercontent.com/dbouquin/IS_605/master/sgd_ex_data/ex3x.dat", "ex3x.dat", method="curl")
x <- read.table('ex3x.dat')
x <- scale(x)
download.file("https://raw.githubusercontent.com/dbouquin/IS_605/master/sgd_ex_data/ex3y.dat", "ex3y.dat", method="curl")
y <- read.table('ex3y.dat')
data3 <- cbind(x,y)
colnames(data3) <- c("area_sqft", "bedrooms","price")
x1 <- rep(1, length(data3$area_sqft))
x <- as.matrix(cbind(x1,x))
y <- as.matrix(y)
L <- length(y)
cost <- function(x,y,theta){
  gradient <- (1/L)* (t(x) %*% ((x%*%t(theta)) - y))
  return(t(gradient)) 
}

I added y to your GD function and created a wrapper function, myGoD, to call yours but first subsetting the data

GD <- function(x, y, alpha){
  theta <- matrix(c(0,0,0), nrow=1)
  theta_r <- NULL
  for (i in 1:500) {
    theta <- theta - alpha*cost(x,y,theta)  
    theta_r <- rbind(theta_r,theta)    
  }
  return(theta_r)
}

myGoD <- function(x, y, alpha, n = nrow(x)) {
  idx <- sample(nrow(x), n)
  y <- y[idx, , drop = FALSE]
  x <- x[idx, , drop = FALSE]
  GD(x, y, alpha)
}

Check to make sure it works and try with different Ns

all.equal(GD(x, y, 0.001), myGoD(x, y, 0.001))
# [1] TRUE

set.seed(1)
head(myGoD(x, y, 0.001, n = 20), 2)
#          x1        V1       V2
# V1 147.5978  82.54083 29.26000
# V1 295.1282 165.00924 58.48424

set.seed(1)
head(myGoD(x, y, 0.001, n = 40), 2)
#          x1        V1        V2
# V1 290.6041  95.30257  59.66994
# V1 580.9537 190.49142 119.23446

Here is how you can use it

alphas <- c(0.001,0.01,0.1,1.0)
ns <- c(47, 40, 30, 20, 10)

par(mfrow = n2mfrow(length(alphas)))
for(i in 1:length(alphas)) {

  # result <- myGoD(x, y, alphas[i]) ## original
  result <- myGoD(x, y, alphas[i], ns[i])

  # red = price 
  # blue = sq ft 
  # green = bedrooms
  plot(result[,1],ylim=c(min(result),max(result)),col="#CC6666",ylab="Value",lwd=0.35,
       xlab=paste("alpha=", alphas[i]),xaxt="n") #suppress auto x-axis title
  lines(result[,2],type="b",col="#0072B2",lwd=0.35)
  lines(result[,3],type="b",col="#66CC99",lwd=0.35)
}

enter image description here

You don't need the wrapper function--you can just change your GD slightly. It is always good practice to explicitly pass arguments to your functions rather than relying on scoping. Before you were assuming that y would be pulled from your global environment; here y must be given or you will get an error. This will avoid many headaches and mistakes down the road.

GD <- function(x, y, alpha, n = nrow(x)){
  idx <- sample(nrow(x), n)
  y <- y[idx, , drop = FALSE]
  x <- x[idx, , drop = FALSE]
  theta <- matrix(c(0,0,0), nrow=1)
  theta_r <- NULL

  for (i in 1:500) {
    theta <- theta - alpha*cost(x,y,theta)  
    theta_r <- rbind(theta_r,theta)    
  }
  return(theta_r)
}
like image 120
rawr Avatar answered Sep 29 '22 14:09

rawr