Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can we consider AWS Glue as a replacement for EMR?

Just a quick question to clarify from Masters, since AWS Glue as an ETL tool, can provide companies with benefits such as, minimal or no server maintenance, cost savings by avoiding over-provisioning or under-provisioning resources, besides running on spark, I am looking for some clarifications, if AWS Glue can replace EMR?

If both can co-exist, how EMR can play a role along with AWS Glue?

Thanks & regards

Yuva

like image 474
Yuva Avatar asked Jan 12 '18 09:01

Yuva


People also ask

Should I use AWS Glue or EMR?

AWS Glue is a flexible and easily scalable ETL platform as it works on AWS serverless platform. But, on the other hand, Amazon EMR is less flexible as it works on your onsite platform. So, in short, if you have flexible requirements, and you need to scale up and down, AWS Glue is a more viable option.

Is AWS Glue using EMR?

The AWS Glue Data Catalog provides a unified metadata repository across a variety of data sources and data formats, integrating with Amazon EMR as well as Amazon RDS, Amazon Redshift, Redshift Spectrum, Athena, and any application compatible with the Apache Hive metastore.

Is AWS Glue an ETL tool?

AWS Glue is a fully managed ETL (extract, transform, and load) service that makes it simple and cost-effective to categorize your data, clean it, enrich it, and move it reliably between various data stores and data streams.

What is alternative of AWS Glue?

AWS Data Pipeline, Airflow, Apache Spark, Talend, and Alooma are the most popular alternatives and competitors to AWS Glue.


1 Answers

As per my understanding, glue cannot be a replacement for EMR. It actually depends on your usecase. There are some limitations with glue ETL;

  • It does not support --packages.
  • You do not have an internal storage for storing temp data.

With glue catalog you can view data in Athena, but it also has few limitations like cannot create table as select, cannot create view etc. You can use Glue data catalog in EMR to overcome limitations of Athena.

So, currently glue can be a replacement for persistent metadata store.

like image 175
Ashutosh Avatar answered Oct 12 '22 07:10

Ashutosh