I've just started learning Big Data, and at this time, I'm working on Flume. The common example I've encountered is for processing of tweets (the example from Cloudera) using some Java.
Just for testing and simulation purposes, can I use my local file system as a Flume source? particularly, some Excel or CSV files? Do I also need to use some Java code, aside from Flume configuration file, just like in Twitter extraction?
Will this source be an event-driven or pollable?
Thanks for your input.
I assume you are using a cloudera sandbox and are talking about putting a file on the sandbox local to the flume agent you are planning on kicking off. A flume agent contains a:
Source Channel Sink
These should sit local to the flume agent. The list of available flume sources is on the user guide: https://flume.apache.org/FlumeUserGuide.html. You could use an Exec source if you just want to stream data from a file with a tail or cat commands. You could also use a Spooling Directory Source will watch the specified directory for new files, and will parse events out of new files as they appear. Have a good read of the user guide. Contains everything you need.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With