Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using local file system as Flume source

Tags:

java

flume

I've just started learning Big Data, and at this time, I'm working on Flume. The common example I've encountered is for processing of tweets (the example from Cloudera) using some Java.

Just for testing and simulation purposes, can I use my local file system as a Flume source? particularly, some Excel or CSV files? Do I also need to use some Java code, aside from Flume configuration file, just like in Twitter extraction?

Will this source be an event-driven or pollable?

Thanks for your input.

like image 881
oikonomiyaki Avatar asked Sep 28 '22 01:09

oikonomiyaki


1 Answers

I assume you are using a cloudera sandbox and are talking about putting a file on the sandbox local to the flume agent you are planning on kicking off. A flume agent contains a:

Source Channel Sink

These should sit local to the flume agent. The list of available flume sources is on the user guide: https://flume.apache.org/FlumeUserGuide.html. You could use an Exec source if you just want to stream data from a file with a tail or cat commands. You could also use a Spooling Directory Source will watch the specified directory for new files, and will parse events out of new files as they appear. Have a good read of the user guide. Contains everything you need.

like image 75
Colman Avatar answered Nov 15 '22 04:11

Colman