Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GetFile processor continually running in Apache NiFi

Tags:

apache-nifi

I am using Apache NiFi to copy some local files to HDFS. I've created a GetFile processor bound to a PutHDFS processor. The GetFile processor recursively queries a read-only directory. The problem I'm encountering is that files are being continually copied, for example if I delete the copied files on HDFS, then they re-appear shortly after.

In the GetFile processor configuration, I've set Keep Source File to true because the directory is read-only and it is specified in the documentation that in read-only directories and if this parameter is set to false then the files get ignored :

If true, the file is not deleted after it has been copied to the Content Repository; this causes the file to be picked up continually and is useful for testing purposes. If not keeping original NiFi will need write permissions on the directory it is pulling from otherwise it will ignore the file.

Is it possible to simply copy each file once ?

like image 529
cheseaux Avatar asked Nov 14 '16 10:11

cheseaux


1 Answers

You can use the ListFile and FetchFile processors to do this. ListFile will keep track of which files it has seen so far, and will not continue to list them unless they have been modified. Make sure you set Completion Strategy in FetchFile to "None", to ensure no attempt is made to move/delete the file.

like image 197
mattyb Avatar answered Sep 28 '22 14:09

mattyb