Does someone succeed to use Apache Flink 0.9 to process data stored on AWS S3? I found they are using own S3FileSystem instead of one from Hadoop... and it looks like it doesn't work. I put the following path s3://bucket.s3.amazonaws.com/folder it's failed with the following exception:
java.io.IOException: Cannot establish connection to Amazon S3: com.amazonaws.services.s3.model.AmazonS3Exception: The request signature we calculated does not match the signature you provided. Check your key and signing method. (Service: Amazon S3; Status Code: 403;
Update May 2016: The Flink documentation now has a page on how to use Flink with AWS
The question has been asked on the Flink user mailing list as well and I've answered it over there: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Processing-S3-data-with-Apache-Flink-td3046.html
tl;dr:
Flink program
public class S3FileSystem {
public static void main(String[] args) throws Exception {
ExecutionEnvironment ee = ExecutionEnvironment.createLocalEnvironment();
DataSet<String> myLines = ee.readTextFile("s3n://my-bucket-name/some-test-file.xml");
myLines.print();
}
}
Add the following to core-site.xml and make it available to Flink:
<property>
<name>fs.s3n.awsAccessKeyId</name>
<value>putKeyHere</value>
</property>
<property>
<name>fs.s3n.awsSecretAccessKey</name>
<value>putSecretHere</value>
</property>
<property>
<name>fs.s3n.impl</name>
<value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
</property>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With