Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unable to convert data frame to h2o object

Tags:

dataframe

r

h2o

I am running the h2o package in Rstudio Version 0.99.447. I run version 10.9.5 OSX.

I would like to set up a local cluster within R, following the steps of this tutorial: http://blenditbayes.blogspot.co.uk/2014/07/things-to-try-after-user-part-1-deep.html

The first step does not seem to be a problem. What does seem to be a problem is converting my data frame to a proper h2o object.

library(mlbench)
dat = BreastCancer[,-1] #reading in data set from mlbench package

library(h2o)
localH2O <- h2o.init(ip = "localhost", port = 54321, startH2O = TRUE) #sets up the cluster
dat_h2o <- as.h2o(localH2O, dat, key = 'dat') #this returns an error message

The above statement as.h2o results in the following error message

Error in as.h2o(localH2O, dat, key = "dat") : 
unused argument (key = "dat")

If I remove the "key" parameter, letting the data reside in the H2O key-value store under a machine generated name, the following error message comes up.

Error in .h2o.doSafeREST(conn = conn, h2oRestApiVersion = h2oRestApiVersion,  
Unexpected CURL error: Empty reply from server

This question asks the same thing as me, but the solution leads me to the same error.

Does anyone have experience with this problem? I'm not entirely sure how to approach this.

like image 667
Boudewijn Aasman Avatar asked Jul 15 '15 23:07

Boudewijn Aasman


3 Answers

The syntax for importing a frame from R into H2O has changed since the last stable release of H2O-Classic and the latest stable release of H2O-3.0. I believe you used a H2O-3.0 release which means some of the arguments in the functions has since changed, the ambiguous "key" argument has been changed to "destination_frame".

H2O-3.0 will behave differently in that it will make note that the first 5 columns are ordered factors in the R data frame; and at the moment we don't have a way of preserving orders for categorical columns. However, to reproduce the same results as the one posted on http://blenditbayes.blogspot.co.uk/2014/07/things-to-try-after-user-part-1-deep.html you'll have to for now write the frame to disk as a CSV and import it into H2O.

library(mlbench)
dat = BreastCancer[,-1] #reading in data set from mlbench package

library(h2o)
localH2O <- h2o.init(ip = "localhost", port = 54321, startH2O = TRUE)

#dat_h2o <- as.h2o(dat, destination_frame = 'dat') 
## Will return a "Provided column type c("ordered", "enum") is unknown." error

pathToData <- paste0(normalizePath("~/Downloads/"), "/dat.csv")
write.table(x = dat, file = pathToData, row.names = F, col.names = T)
dat_h2o <- h2o.importFile(path = pathToData, destination_frame = "dat")

For R data.frames that do not have ordered factor columns you can simply use h2o_frame <- as.h2o(object = df) where class(df) is a data.frame.

like image 119
Amy Wang Avatar answered Nov 10 '22 05:11

Amy Wang


The BreastCancer data frame has 5 ord.factors and 5 factors. As Amy Wang wrote, you have to convert factors into numeric. If you don't want to write data to disc and then to read again the data you can convert them with sapply().

## Format data with no factor
data(BreastCancer, package = 'mlbench') # Load data from mlbench package
dat <- BreastCancer[, -1]  # Remove the ID column
dat[, c(1:ncol(dat))] <- sapply(dat[, c(1:ncol(dat))], as.numeric) # Convert factors into numeric



## Start a local cluster with default parameters
library(h2o)
localH2O <- h2o.init(ip = "localhost", port = 54321, startH2O = TRUE)

## Convert Breast Cancer into H2O
dat.h2o <- as.h2o(dat, destination_frame = "midata")
like image 42
Juan Pueyo Avatar answered Nov 10 '22 06:11

Juan Pueyo


Try this. It worked for me.

## S3 method for class 'data.frame'
dat.hex <- as.h2o(dat, destination_frame = "dat.hex", ...)
like image 1
Nikhar Khandelwal Avatar answered Nov 10 '22 07:11

Nikhar Khandelwal