Just a quick question to clarify from Masters, since AWS Glue as an ETL tool, can provide companies with benefits such as, minimal or no server maintenance, cost savings by avoiding over-provisioning or under-provisioning resources, besides running on spark, I am looking for some clarifications, if AWS Glue can replace EMR?
If both can co-exist, how EMR can play a role along with AWS Glue?
Thanks & regards
Yuva
AWS Glue is a flexible and easily scalable ETL platform as it works on AWS serverless platform. But, on the other hand, Amazon EMR is less flexible as it works on your onsite platform. So, in short, if you have flexible requirements, and you need to scale up and down, AWS Glue is a more viable option.
The AWS Glue Data Catalog provides a unified metadata repository across a variety of data sources and data formats, integrating with Amazon EMR as well as Amazon RDS, Amazon Redshift, Redshift Spectrum, Athena, and any application compatible with the Apache Hive metastore.
AWS Glue is a fully managed ETL (extract, transform, and load) service that makes it simple and cost-effective to categorize your data, clean it, enrich it, and move it reliably between various data stores and data streams.
AWS Data Pipeline, Airflow, Apache Spark, Talend, and Alooma are the most popular alternatives and competitors to AWS Glue.
As per my understanding, glue cannot be a replacement for EMR. It actually depends on your usecase. There are some limitations with glue ETL;
With glue catalog you can view data in Athena, but it also has few limitations like cannot create table as select, cannot create view etc. You can use Glue data catalog in EMR to overcome limitations of Athena.
So, currently glue can be a replacement for persistent metadata store.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With