I have a bunch of binary files compressed into *gz format. These are generated on a remote node and must be transferred to HDFS located one of the datacenter's server.
I'm exploring the option of sending the files with Flume; I explore the option of doing this with a Spooling Directory configuration, but apparently this only works when the file's directory is located locally on the same HDFS node.
Any suggestions how to tackle this problem?
Why don't you run two different Flume agents, one on the remote machine and one on your date node. The agent on your remote machine can read the spooling directory and send it to avro sink. And the agent on the datanode can read through avro source and dump the data to HDFS.
There is no out-of-box solution for such case. But you could try these workarounds:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With