I am using Apache NiFi to copy some local files to HDFS.
I've created a GetFile
processor bound to a PutHDFS
processor. The GetFile
processor recursively queries a read-only directory. The problem I'm encountering is that files are being continually copied, for example if I delete the copied files on HDFS, then they re-appear shortly after.
In the GetFile
processor configuration, I've set Keep Source File
to true because the directory is read-only and it is specified in the documentation that in read-only directories and if this parameter is set to false then the files get ignored :
If true, the file is not deleted after it has been copied to the Content Repository; this causes the file to be picked up continually and is useful for testing purposes. If not keeping original NiFi will need write permissions on the directory it is pulling from otherwise it will ignore the file.
Is it possible to simply copy each file once ?
You can use the ListFile and FetchFile processors to do this. ListFile will keep track of which files it has seen so far, and will not continue to list them unless they have been modified. Make sure you set Completion Strategy in FetchFile to "None", to ensure no attempt is made to move/delete the file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With