Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spring Batch - Read files from Aws S3

I am trying to read files from AWS S3 and process it with Spring Batch:

Can a Spring Itemreader process this Task? If so, How do I pass the credentials to S3 client and config my spring xml to read a file or multiple files

<bean id="itemReader" class=""org.springframework.batch.item.file.FlatFileItemReader"">
    <property name="resource" value=""${aws.file.name}"" />
    </bean>
like image 575
sve Avatar asked Jun 14 '15 16:06

sve


2 Answers

Update To use the Spring-cloud-AWS you would still use the FlatFileItemReader but now you don't need to make a custom extended Resource.

Instead you set up a aws-context and give it your S3Client bean.

    <aws-context:context-resource-loader amazon-s3="amazonS3Client"/>

The reader would be set up like any other reader - the only thing that's unique here is that you would now autowire your ResourceLoader

@Autowired
private ResourceLoader resourceLoader;

and then set that resourceloader:

@Bean
public FlatFileItemReader<Map<String, Object>> AwsItemReader() {
    FlatFileItemReader<Map<String, Object>> reader = new FlatFileItemReader<>();
    reader.setLineMapper(new JsonLineMapper());
    reader.setRecordSeparatorPolicy(new JsonRecordSeparatorPolicy());
    reader.setResource(resourceLoader.getResource("s3://" + amazonS3Bucket + "/" + file));
    return reader;
}

I would use the FlatFileItemReader and the customization that needs to take place is making your own S3 Resource object. Extend Spring's AbstractResource to create your own AWS resource that contains the AmazonS3 Client, bucket and file path info etc..

For the getInputStream use the Java SDK:

        S3Object object = s3Client.getObject(new GetObjectRequest(bucket, awsFilePath));
        return object.getObjectContent();

Then for contentLength -

return s3Client.getObjectMetadata(bucket, awsFilePath).getContentLength();

and lastModified use

.getLastModified().getTime();

The Resource you make will have the AmazonS3Client which contains all the info your spring-batch app needs to communicate with S3. Here's what it could look like with Java config.

    reader.setResource(new AmazonS3Resource(amazonS3Client, amazonS3Bucket, inputFile));
like image 76
mtoutcalt Avatar answered Nov 12 '22 18:11

mtoutcalt


More simple steps are:

  1. Create AWSS3 client bean.
  2. Create ResourceLoader bean.
  3. Use ResourceLoader to set S3 resources.

Firstly, you need to create AWSS3 client and ResourceLoader bean in your aws configuration file, like this.

@Configuration
@EnableContextResourceLoader
public class AWSConfiguration {

@Bean
@Primary
public AmazonS3 getAmazonS3Cient() {

    ClientConfiguration config = new ClientConfiguration();
    
    config.setConnectionTimeout(5000 * 10);
    config.setSocketTimeout(5000 * 10);

    return AmazonS3ClientBuilder.standard()
            .withClientConfiguration(config).build();
}


@Bean
@Autowired
public static ResourceLoaderBeanPostProcessor resourceLoaderBeanPostProcessor(
        AmazonS3 amazonS3EncryptionClient) {
    return new ResourceLoaderBeanPostProcessor(amazonS3EncryptionClient);
}

}

Then use resourceloader bean in ItemReader to set S3 resources.

@Autowired
private ResourceLoader resourceLoader;

@Bean
public FlatFileItemReader<String> fileItemReader() {

FlatFileItemReader<String> reader = new FlatFileItemReader<>();
reader.setLineMapper(new JsonLineMapper()); //Change line mapper as per your need
reader.setResource(resourceLoader.getResource("s3://" + amazonS3Bucket + "/" + file));
return reader;
}
like image 30
Gaurav Raghav Avatar answered Nov 12 '22 19:11

Gaurav Raghav