Does AWS Lambda can be preferred over AWS Glue Job?

Tags:

In AWS Glue job, we can write some script and execute the script via job.

In AWS Lambda too, we can write the same script and execute the same logic provided in above job.

So, my query is not whats the difference between AWS Glue Job vs AWS Lambda, BUT iam trying to undestand when AWS Glue job should be preferred over AWS Lambda, especially while when both does the same job? If both does the same job, then ideally I would blindly prefer using AWS Lambda itself, right?

Please try to understand my query..

555

asked Aug 26 '20 14:08

john

3 Answers

Additional points:

Per this source and Lambda FAQ and Glue FAQ

Lambda can use a number of different languages (Node.js, Python, Go, Java, etc.) vs. Glue can only execute jobs using Scala or Python code.

Lambda can execute code from triggers by other services (SQS, Kafka, DynamoDB, Kinesis, CloudWatch, etc.) vs. Glue which can be triggered by lambda events, another Glue jobs, manually or from a schedule.

Lambda runs much faster for smaller tasks vs. Glue jobs which take longer to initialize due to the fact that it's using distributed processing. That being said, Glue leverages its parallel processing to run large workloads faster than Lambda.

Lambda looks to require more complexity/code to integrate into data sources (Redshift, RDS, S3, DBs running on ECS instances, DynamoDB, etc.) while Glue can easily integrate with these. However, with the addition of Step Functions, multiple lambda functions can be written and ordered sequentially due reduce complexity and improve modularity where each function could integrate into a aws service (Redshift, RDS, S3, DBs running on ECS instances, DynamoDB, etc.)

Glue looks to have a number of additional components, such as Data Catalog which is a central metadata repository to view your data, a flexible scheduler that handles dependency resolution/job monitoring/retries, AWS Glue DataBrew for cleaning and normalizing data with a visual interface, AWS Glue Elastic Views for combining and replicating data across multiple data stores, AWS Glue Schema Registry to validate streaming data schema.

There are other examples I am missing, so feel free to comment and I can update.

136

answered Oct 18 '22 03:10

deesolie

Lambda has a lifetime of fifteen minutes. It can be used to trigger a glue job as an event based activity. That is, when a file lands in S3 for example, we can have an event trigger which can run a glue job. Glue is a managed services for all data processing.

If the data is very low maybe you can do it in lambda, but for some reason the process goes beyond fifteen minutes, then data processing would fail.

answered Oct 18 '22 01:10

Yuva

The answer to this can involve some foundational design decisions. What is this job doing? What kind of data are you dealing with? Is there a decision to be made whether the task should be executed in a batch or event oriented paradigm?

Batch

This may be necessary or desirable because the task:

Is being done over large monolithic data (e.g., binary).
Relies on context of multiple records in a dataset such that they must be loaded into a single job.
Order matters.

I feel like just as often I see batch handling chosen by default because "this is the way we've always done it" but breaking from this approach could be worth consideration.

Glue is built for batch operations. With a current maximum execution time of 15 minutes and maximum memory of 10gb, Lambda has become capable of processing fairly large datasets in a single execution, as well. It can be difficult to pin down a direct cost comparison without specifics of the workload. When it comes to development, I feel that Lambda has the edge as far as tooling to build, test, deploy.

Event

In the case where your data consists of a set of records, it might behoove you to parse and "stream" them into Lambda. Consider a flow like:

CSV lands in S3.
S3 event triggers Lambda.
Lambda reads and parses CSV into discrete events, submits to another Lambda or publishes to SNS for downstream processing. Concurrent instances of this Lambda can be employed to speed up ingest, where each instance is responsible for certain lines of the S3 object.

This pushes all logic and error handling, as well as resources required, to the level of individual event/record level. Often mechanisms such as dead-letter queues are employed for remediation. While context of a given container persists across invocations - assuming the container has not been idle and torn down - Lambda should generally be considered stateless such that the processing of an event/record is thought of as occurring within its own scope, outside that of others in the dataset.

answered Oct 18 '22 01:10

ormus

Related questions
                            
                                Check if a folder exists on S3 using node js aws-sdk
                            
                                Transfer AISPL account to AWS account
                            
                                Customization of Login Page Cognito
                            
                                How to enable the CodeDeploy Agent on Amazon Linux 2?
                            
                                Is lambda layers compatible with Go?
                            
                                Best way to synchronise RDS DB and Cognito
                            
                                AWS S3 presigned URL contains X-Amz-Security-Token
                            
                                Is there a way to have AWS RDS Public Accessibility = No but still accessible outside of EC2 instance?
                            
                                How to configure CloudFront using CloudFormation to set the 'Headers' property in 'ForwardedValues' to 'all'?
                            
                                Can I configure my .ssh/config file to use my aws pem file as default for all ec2 connections
                            
                                nginx logs from AWS say "HELP...batman"?
                            
                                Clarifications on serving non website S3 bucket via CloudFront as web site
                            
                                AZ64 compression format performance
                            
                                Node AWS S3 get object metadata?
                            
                                AWS Step Functions: Combine task input with *partial* task output
                            
                                How to run an existing aws amplify project
                            
                                How to log raw JSON to Cloudwatch from AWS Lambda in node.js?
                            
                                Adding AWS Cognito User Pool role using CDK
                            
                                New command for AWS CLI v2 to replace `aws ecr get-login` of AWS CLI v1
                            
                                botocore.exceptions.SSLError: SSL validation failed on WIndows

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Does AWS Lambda can be preferred over AWS Glue Job?

Tags:

amazon-web-services

aws-lambda

aws-glue