We are trying to implement Presto with Kubernetes. We have a kubernetes cluster running on cloud as a service. I tried to google on this but could not find a conclusive result as to what may be the best practices to deploy Presto with Kubernetes. Though there exists the official github of Presto - but does not help. Below are the two questions I am trying to seek an answer for:
You could install with the official helm chart from https://github.com/helm/charts/tree/master/stable/presto It provides an option to set the number of workers. With the official chart you should be able to ask questions in the Kubernetes charts slack channel (through http://slack.k8s.io) and raise issues in GitHub if you hit any. Or there are non-helm examples such as https://github.com/dharmeshkakadia/presto-kubernetes
The question of how many workers isn't specific to Kubernetes. It's a question of how much and what kind of load you will need the deployment to handle and will also depend on what hardware your Kubernetes cluster is using. If you're not sure then perhaps you can deploy with the defaults and adjust as needed. This is suggested by https://prestodb.io/presto-admin/docs/current/installation/presto-configuration.html You'll find some of the settings such as memory per node set in the Deployment parts of the kubenernetes yaml descriptors or in the values.yaml in the case of the helm chart.
To performance test your deployment you will need test data and can then run queries against the cluster. So the same process you would follow outside of Kubernetes. There are tools to help such as https://www.lewuathe.com/use-benchto-for-evaluation-of-presto.html or https://github.com/prestodb/tempto You may also want to look at https://kognitio.com/blog/presto-performance-powerful-or-problematic/
There are a couple of examples of how it could be achieved available, for example dharmeshkakadia/presto-kubernetes but I guess you might want to use a StatefulSet
here, rather. Not sure concerning perf tests because much of it will depend on the kind of persistent volume you choose or better say by what it is backed, for example NFS, Ceph, or maybe you are in a cloud environment with native storage?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With