I have a few questions or doubts on sparkling water and why is it needed. Lets assume that I have a generated h2o model with both binary and pojo. Now I want to deploy the model into production and have an option for using pojo and binary (sparkling water) both. <ol> <li>Which one should I use? Direct spark with pojo or sparkling water with Binary.</li> <li>What is the exact use of sparkling water, when we can easily deploy a model using pojo and spark itself?</li> <li>Is sparkling water needed only when you have to train model on huge amounts of data? Or it can be used in PROD deployments of model's as well.</li> </ol> Example: https://github.com/h2oai/h2o-droplets/blob/master/h2o-pojo-on-spark-droplet/src/main/scala/examples/PojoExample.scala Uses spark to run a pojo model. Example: https://github.com/h2oai/h2o-droplets/blob/master/sparkling-water-droplet/src/main/scala/water/droplets/SparklingWaterDroplet.scala Trains / Runs a model in sparkling water. What are the advantages which sparkling water h2o provides over normal spark?

<ol> <li> Which one should I use? Direct spark with pojo or sparkling water with Binary. <ul> <li>There is no 'right' answer, it depends on your use case. It sounds like what you want is the POJO/MOJO in Spark, so you can do scoring without the added dependency of having an H2O cluster up.</li> </ul> </li> <li> What is the exact use of sparkling water, when we can easily deploy a model using pojo and spark itself? <ul> <li>The exact use of Sparkling Water is to have an H2O available within a Spark context. This is particularly useful for training: you can leverage Spark's many data connectors, munging capabilities etc. POJO/MOJO + Spark is sufficient for scoring </li> </ul> </li> <li> Is sparkling water needed only when you have to train model on huge amounts of data? Or it can be used in PROD deployments of model's as well. <ul> <li>Sparkling Water is needed when you want to leverage H2O's algorithms in a context that plays nicely w/ the Spark ecosystem. </li> </ul> </li> </ol> If putting a model in "production" means having "always on" scoring exposed as a REST endpoint or similar: the POJO/MOJO is the way you want to go (H2O clusters are not highly available). You'll need to make sure you're handling incoming data correctly yourself though. If you are doing batch scoring, nightly or otherwise, then it may make sense to use the binary model w/ Sparkling Water because parsing incoming data becomes trivial (asH2OFrame(..)) and scoring is easy as predict()

Difference between spark with h2o and sparkling water

1 Answers

Which one should I use? Direct spark with pojo or sparkling water with Binary.
- There is no 'right' answer, it depends on your use case. It sounds like what you want is the POJO/MOJO in Spark, so you can do scoring without the added dependency of having an H2O cluster up.
What is the exact use of sparkling water, when we can easily deploy a model using pojo and spark itself?
- The exact use of Sparkling Water is to have an H2O available within a Spark context. This is particularly useful for training: you can leverage Spark's many data connectors, munging capabilities etc. POJO/MOJO + Spark is sufficient for scoring
Is sparkling water needed only when you have to train model on huge amounts of data? Or it can be used in PROD deployments of model's as well.
- Sparkling Water is needed when you want to leverage H2O's algorithms in a context that plays nicely w/ the Spark ecosystem.

If putting a model in "production" means having "always on" scoring exposed as a REST endpoint or similar: the POJO/MOJO is the way you want to go (H2O clusters are not highly available). You'll need to make sure you're handling incoming data correctly yourself though.

If you are doing batch scoring, nightly or otherwise, then it may make sense to use the binary model w/ Sparkling Water because parsing incoming data becomes trivial (asH2OFrame(..)) and scoring is easy as predict()

answered Nov 13 '22 08:11

Nick Karpov

Related questions
                            
                                what is the different between h2o.ensemble and h2o.stack in package h2oEnsemble
                            
                                as.h2o() in R to upload files to h2o environment takes a long time
                            
                                is there a way to convert h2oframe to pandas dataframe
                            
                                Print "pretty" tables for h2o models in R
                            
                                H2OFrame() in Python is adding additional duplicate rows to the Pandas DataFrame- Bug?
                            
                                Python H2O Memory Management
                            
                                How to get data into h2o fast
                            
                                h2o warning message too old cluster
                            
                                Using Hyper-parameters from H2O to re-build XGBoost in Sklearn gives Difference Performance in Python
                            
                                How to drop rows in an H2OFrame?
                            
                                How to get sparse matrices into H2O?
                            
                                Can I use autoencoder for clustering?
                            
                                How to allow h2o to access all available memory?
                            
                                How to understand the metrics of H2OModelMetrics Object through h2o.performance
                            
                                R H2O - Memory management
                            
                                Subsetting in H2O R
                            
                                Loading data bigger than the memory size in h2o
                            
                                H2O R api: retrieving optimal model from grid search
                            
                                Transforming h2o model into non-h2o one

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Difference between spark with h2o and sparkling water

Tags:

h2o

sparkling-water

Lalit Agarwal

People also ask

1 Answers

Nick Karpov

Recent Activity

Donate For Us