In Apache NiFi, using FetchS3Object to read from an S3 bucket, I see it can reads all the object in bucket and as they are added. Is it possible:
- To configure the processor to read only objects added now onwards, not the one already present?
- How can I make it read a particular folder in the bucket?
NiFi seems great, just missing examples in their documentation for atleast the popular processors.
A combination of ListS3 and FetchS3Object processors will do this:
-
ListS3 - to enumerate your S3 bucket and generate flowfiles referencing each object. You can configure the Prefix property to specify a particular folder in the bucket to enumerate only a subset. ListS3 keeps track of what it has read using NiFi's state feature, so it will generate new flowfiles as new objects are added to the bucket.
-
FetchS3Object - to read S3 objects into flowfile content. You can use the output of ListS3 by configuring the FetchS3Object's Bucket property to
${s3.bucket}
and Object Key property to ${filename}
.