Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In NiFi is it possible to read selectively through FetchS3Object processor?

In Apache NiFi, using FetchS3Object to read from an S3 bucket, I see it can reads all the object in bucket and as they are added. Is it possible:

  1. To configure the processor to read only objects added now onwards, not the one already present?
  2. How can I make it read a particular folder in the bucket?

NiFi seems great, just missing examples in their documentation for atleast the popular processors.

like image 578
Sammy Avatar asked Jan 21 '17 17:01

Sammy


1 Answers

A combination of ListS3 and FetchS3Object processors will do this:

  1. ListS3 - to enumerate your S3 bucket and generate flowfiles referencing each object. You can configure the Prefix property to specify a particular folder in the bucket to enumerate only a subset. ListS3 keeps track of what it has read using NiFi's state feature, so it will generate new flowfiles as new objects are added to the bucket.
  2. FetchS3Object - to read S3 objects into flowfile content. You can use the output of ListS3 by configuring the FetchS3Object's Bucket property to ${s3.bucket} and Object Key property to ${filename}.

enter image description here

like image 58
James Avatar answered Sep 29 '22 07:09

James