I was looking at Databricks because it integrates with AWS services like Kinesis, but it looks to me like SageMaker is a direct competitor to Databricks? We are heavily using AWS, is there any reason to add DataBricks into the stack or odes SageMaker fill the same role?
Databricks focuses on big data analytics, letting you run your data processing code on compute clusters. Sagemaker focuses on experiment tracking and model deployment. Both tools let data scientists write code in a familiar Notebook environment and run it on scalable infrastructure.
Although AWS EMR integrates with AWS services, a user has to spend time configuring tools. Whereas when comparing Databricks vs EMR, Databricks allows users with less technical information to perform data science and analytics at scale without much prior knowledge.
SageMaker does not allow you to schedule training jobs. SageMaker does not provide a mechanism for easily tracking metrics logged during training. We often fit feature extraction and model pipelines. We can inject the model artifacts into AWS-provided containers, but we cannot inject the feature extractors.
Amazon SageMaker is a fully managed machine learning service. With SageMaker, data scientists and developers can quickly and easily build and train machine learning models, and then directly deploy them into a production-ready hosted environment.
SageMaker is a great tool for deployment, it simplifies a lot of processes configuring containers, you only need to write 2-3 lines to deploy the model as an endpoint and use it. SageMaker also provides the dev platform (Jupyter Notebook) which supports Python and Scala (sparkmagic kernal) developing, and i managed installing external scala kernel in jupyter notebook. Overall, SageMaker provides end-to-end ML services. Databricks has unbeatable Notebook environment for Spark development.
Conclusion
Databricks is a better platform for Big data(scala, pyspark) Developing.(unbeatable notebook environment)
SageMaker is better for Deployment. and if you are not working on big data, SageMaker is a perfect choice working with (Jupyter notebook + Sklearn + Mature containers + Super easy deployment).
SageMaker provides "real time inference", very easy to build and deploy, very impressive. you can check the official SageMaker Github. https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-python-sdk/scikit_learn_inference_pipeline
Having worked in both environments within the last year, I specifically remember:
Databricks having easy access to stored databases/tables to query out of and use Scala/Spark within the Jupyter Notebooks. I remember how nice it was to just see and preview the schemas and query quickly and be off to the races for research. I also remember the quick functionality to set up a timed job on a Notebook (re-run every month) and re-scale to job instance types (much cheaper) with some button clicks. These functionalities might exist somewhere in AWS, but I remember it being great in Databricks.
AWS SageMaker + Lambda + API Gateway: Legitimately, today, I worked through the deployment of AWS SageMaker + Lambda + API Gateway, and after getting used to some syntax and specifics of the Lambda + API Gateway it was pretty straightforward. Doing another AWS deployment wouldn't take more than 20 minutes (pending unique specificities). Other things like Model Monitoring and CloudWatch are nice as well. I did notice Jupyter Notebook Kernels for many languages like Python (what I did it in), R, and Scala, along with specific packages already pre-installed like conda and sagemaker ml packages and methods.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With