I am running the h2o package in Rstudio Version 0.99.447. I run version 10.9.5 OSX.
I would like to set up a local cluster within R, following the steps of this tutorial: http://blenditbayes.blogspot.co.uk/2014/07/things-to-try-after-user-part-1-deep.html
The first step does not seem to be a problem. What does seem to be a problem is converting my data frame to a proper h2o object.
library(mlbench)
dat = BreastCancer[,-1] #reading in data set from mlbench package
library(h2o)
localH2O <- h2o.init(ip = "localhost", port = 54321, startH2O = TRUE) #sets up the cluster
dat_h2o <- as.h2o(localH2O, dat, key = 'dat') #this returns an error message
The above statement as.h2o results in the following error message
Error in as.h2o(localH2O, dat, key = "dat") :
unused argument (key = "dat")
If I remove the "key" parameter, letting the data reside in the H2O key-value store under a machine generated name, the following error message comes up.
Error in .h2o.doSafeREST(conn = conn, h2oRestApiVersion = h2oRestApiVersion,
Unexpected CURL error: Empty reply from server
This question asks the same thing as me, but the solution leads me to the same error.
Does anyone have experience with this problem? I'm not entirely sure how to approach this.
The syntax for importing a frame from R into H2O has changed since the last stable release of H2O-Classic and the latest stable release of H2O-3.0. I believe you used a H2O-3.0 release which means some of the arguments in the functions has since changed, the ambiguous "key" argument has been changed to "destination_frame".
H2O-3.0 will behave differently in that it will make note that the first 5 columns are ordered factors in the R data frame; and at the moment we don't have a way of preserving orders for categorical columns. However, to reproduce the same results as the one posted on http://blenditbayes.blogspot.co.uk/2014/07/things-to-try-after-user-part-1-deep.html you'll have to for now write the frame to disk as a CSV and import it into H2O.
library(mlbench)
dat = BreastCancer[,-1] #reading in data set from mlbench package
library(h2o)
localH2O <- h2o.init(ip = "localhost", port = 54321, startH2O = TRUE)
#dat_h2o <- as.h2o(dat, destination_frame = 'dat')
## Will return a "Provided column type c("ordered", "enum") is unknown." error
pathToData <- paste0(normalizePath("~/Downloads/"), "/dat.csv")
write.table(x = dat, file = pathToData, row.names = F, col.names = T)
dat_h2o <- h2o.importFile(path = pathToData, destination_frame = "dat")
For R data.frames that do not have ordered factor columns you can simply use h2o_frame <- as.h2o(object = df)
where class(df)
is a data.frame
.
The BreastCancer data frame has 5 ord.factors and 5 factors. As Amy Wang wrote, you have to convert factors into numeric. If you don't want to write data to disc and then to read again the data you can convert them with sapply().
## Format data with no factor
data(BreastCancer, package = 'mlbench') # Load data from mlbench package
dat <- BreastCancer[, -1] # Remove the ID column
dat[, c(1:ncol(dat))] <- sapply(dat[, c(1:ncol(dat))], as.numeric) # Convert factors into numeric
## Start a local cluster with default parameters
library(h2o)
localH2O <- h2o.init(ip = "localhost", port = 54321, startH2O = TRUE)
## Convert Breast Cancer into H2O
dat.h2o <- as.h2o(dat, destination_frame = "midata")
Try this. It worked for me.
## S3 method for class 'data.frame'
dat.hex <- as.h2o(dat, destination_frame = "dat.hex", ...)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With