Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Run Apache Flink with Amazon S3

Does someone succeed to use Apache Flink 0.9 to process data stored on AWS S3? I found they are using own S3FileSystem instead of one from Hadoop... and it looks like it doesn't work. I put the following path s3://bucket.s3.amazonaws.com/folder it's failed with the following exception:

java.io.IOException: Cannot establish connection to Amazon S3: com.amazonaws.services.s3.model.AmazonS3Exception: The request signature we calculated does not match the signature you provided. Check your key and signing method. (Service: Amazon S3; Status Code: 403;

like image 340
Konstantin Kudryavtsev Avatar asked Oct 06 '15 00:10

Konstantin Kudryavtsev


1 Answers

Update May 2016: The Flink documentation now has a page on how to use Flink with AWS


The question has been asked on the Flink user mailing list as well and I've answered it over there: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Processing-S3-data-with-Apache-Flink-td3046.html

tl;dr:

Flink program

public class S3FileSystem {
   public static void main(String[] args) throws Exception {
      ExecutionEnvironment ee = ExecutionEnvironment.createLocalEnvironment();
      DataSet<String> myLines = ee.readTextFile("s3n://my-bucket-name/some-test-file.xml");
      myLines.print();
   }
}

Add the following to core-site.xml and make it available to Flink:

<property>
    <name>fs.s3n.awsAccessKeyId</name>
    <value>putKeyHere</value>
</property>

<property>
    <name>fs.s3n.awsSecretAccessKey</name>
    <value>putSecretHere</value>
</property>
<property>
    <name>fs.s3n.impl</name>
    <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
</property>
like image 60
Robert Metzger Avatar answered Oct 02 '22 12:10

Robert Metzger