AWS Lambda/ Aws Batch work flow

Tags:

I have written a lambda that is triggered off s3 bucket to unzip a zip file and process a text document inside. Due to the limitation of memory of lambda i need to move my process over to something like AWS batch. Correct me if I am wrong but my work flow should look something like this.

work flow

I beleive I need to write a lambda to put the location of the s3 bucket on amazons SQS were a AWS batch can read the location and do all the unzipping/data processing their were their is more memory.

Here is my current lambda, it takes in the event triggered by the s3 bucket, checks to see if it is a zip file then pushes the name of that s3 Key to SQS. Should I tell AWS batch to start reading the queue here in my lambda? I am totally new to AWS in general and not sure were to go from here.

public class dockerEventHandler implements RequestHandler<S3Event, String> {

private static BigData app = new BigData();
private static DomainOfConstants CONST = new DomainOfConstants();
private static Logger log = Logger.getLogger(S3EventProcessorUnzip.class);

private static AmazonSQS SQS;
private static CreateQueueRequest createQueueRequest;
private static Matcher matcher;
private static String srcBucket, srcKey, extension, myQueueUrl;

@Override
public String handleRequest(S3Event s3Event, Context context) 
{
    try {
        for (S3EventNotificationRecord record : s3Event.getRecords())
        {
            srcBucket = record.getS3().getBucket().getName();
            srcKey = record.getS3().getObject().getKey().replace('+', ' ');
            srcKey = URLDecoder.decode(srcKey, "UTF-8");
            matcher = Pattern.compile(".*\\.([^\\.]*)").matcher(srcKey);

            if (!matcher.matches()) 
            {
                log.info(CONST.getNoConnectionMessage() + srcKey);
                return "";
            }
            extension = matcher.group(1).toLowerCase();

            if (!"zip".equals(extension)) 
            {
                log.info("Skipping non-zip file " + srcKey + " with extension " + extension);
                return "";
            }
            log.info("Sending object location to key" + srcBucket + "//" + srcKey);

            //pass in only the reference of where the object is located
            createQue(CONST.getQueueName(), srcKey);
        }
    }
    catch (IOException e)
    {
        log.error(e);           
    }
    return "Ok";
} 

/*
 * 
 * Setup connection to amazon SQS
 * TODO - Find updated api for sqs connection to eliminate depreciation
 * 
 * */
@SuppressWarnings("deprecation")
public static void sQSConnection() {
    app.setAwsCredentials(CONST.getAccessKey(), CONST.getSecretKey());       
    try{
        SQS = new AmazonSQSClient(app.getAwsCredentials()); 
        Region usEast1 = Region.getRegion(Regions.US_EAST_1);
        SQS.setRegion(usEast1);
    } 
    catch(Exception e){
        log.error(e);       
    }
}

//Create new Queue
public static void createQue(String queName, String message){
    createQueueRequest = new CreateQueueRequest(queName);
    myQueueUrl = SQS.createQueue(createQueueRequest).getQueueUrl();
    sendMessage(myQueueUrl,message);
}

//Send reference to the s3 objects location to the queue
public static void sendMessage(String SIMPLE_QUE_URL, String S3KeyName){
    SQS.sendMessage(new SendMessageRequest(SIMPLE_QUE_URL, S3KeyName));
}

//Fire AWS batch to pull from que
private static void initializeBatch(){
    //TODO
}

I have setup docker and understand docker images. I believe my docker image should contain all the code to read the queue, unzip, process and kit the file to RDS all in one docker image/container.

I am looking for someone who has something similar done they could share to help. Something along the lines of :

Mr. S3: Hey lambda I have a file

Mr. Lambda :Okay S3 I see you, hey aws batch could you unzip and do stuff to this

Mr. Batch: Gotchya mr lambda, ill take care of that and put it in RDS or some data base after.

I have not written the class/docker image yet but i have all the code done to process/unzip and kick off to rds done. Lambda just is limited to memory due to some of the files being 1gb or bigger.

709

asked Jun 22 '17 15:06

John Hanewich

1 Answers

Okay so after looking through the AWS docs on Batch, you don't need an SQS queue. Batch has a concept called Job Queue which is similar to an SQS FIFO queue, but different in that these job queues have priorities, and jobs within them can have dependencies on other jobs. The basic process is:

First the weird part is setting up IAM roles so that container agents can talk to the container service, and AWS batch is able to launch various instances when it needs to (there's also a separate role needed for if you do spot instances). The details on permissions required can be found in this doc (PDF) at around page 54.
Now when that's done you setup a compute environment. These are EC2 on-demand or spot instances which hold your containers. Jobs operate on a container level. The idea is that your compute environment is the max resource allocation that your job containers can utilize. Once that limit is hit, your jobs have to wait for resources to be freed up.
Now you create a job queue. This associates jobs with the compute environment you created.
Now you create a job definition. Well, technically you don't have to and can do it through lambda but this makes things a bit easier. Your job definition will indicate what container resources will be needed for your job ( you can of course override this in lambda as well )
Now that this is all done you'll want to create a lambda function. This will be triggered by your S3 bucket event. The function will need necessary IAM permissions to run submit job against the batch service (as well as any other permissions). Basically all the lambda needs to do is call submit job to AWS batch. The basic parameters you'll want are the job queue and the job definition. You'll also set the S3 key for the zip needed as a parameter to the job.
Now when the appropriate S3 event is triggered, it calls lambda, which then submits the job to the AWS batch job queue. Then assuming the setup is all good it will happily pull up resources to process your job. Note that depending on EC2 instance size and container resources allocated this may take a bit (much longer than prepping a Lambda function).

118

answered Oct 06 '22 22:10

Chris White

Related questions
                            
                                Column name as a parameter to Spring Data JPA Query
                            
                                How to write a stdout statement in a .log file in spring boot application
                            
                                Java Parallel Streams close thread
                            
                                What is out-of-thin-air safety?
                            
                                Load static resource in Spring Boot with Thymeleaf
                            
                                Which provider is responsible for AES/CTR/NoPadding?
                            
                                JPA and JSON operator native query
                            
                                Does Vaadin 8 `Binder::bindInstanceFields` only work with String data types?
                            
                                Inherit @Component in Spring
                            
                                JUnit Testing in Android Studio with Firebase
                            
                                Specific element permutation within an array of characters in JAVA?
                            
                                Is idle thread taking CPU execute time in Java Executors?
                            
                                Spring Data Rest : Foreign key is update with null after post call in one to many relationship
                            
                                Setting the AWS region programmatically
                            
                                Java Synchronized Threads not working as expected
                            
                                Java - Convert Image to black and white - fails with bright colors
                            
                                How to add in a specific index of com.google.gson.JsonArray?
                            
                                Save an entity and all its related entities in a single save in spring boot
                            
                                Spring: Autowire bean that does not have qualifier
                            
                                Is there Java stream equivalent to while with variable assignment

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

AWS Lambda/ Aws Batch work flow

Tags:

java

amazon-web-services

amazon-s3

workflow

aws-batch

John Hanewich

People also ask

1 Answers

Chris White

Recent Activity

Donate For Us