Flume NG and HDFS

Tags:

I am very new to hadoop , so please excuse the dumb questions.

I have the following knowledge Best usecase of Hadoop is large files thus helping in efficiency while running mapreduce tasks.

Keeping the above in mind I am somewhat confused about Flume NG. Assume I am tailing a log file and logs are produced every second, the moment the log gets a new line it will be transferred to hdfs via Flume.

a) Does this mean that flume creates a new file on every line that is logged in the log file I am tailing or does it append to the existing hdfs file ??

b) is append allowed in hdfs in the first place??

c) if the answer to b is true ?? ie contents are appended constantly , how and when should I run my mapreduce application?

Above questions could sound very silly but a answers to the same will be highly appreciated.

PS: I have not yet set up Flume NG or hadoop as yet, just reading the articles to get an understanding and how it could add value to my company.

522

asked Jul 18 '13 13:07

user1103472

1 Answers

Flume writes to HDFS by means of HDFS sink. When Flume starts and begins to receive events, the sink opens new file and writes events into it. At some point previously opened file should be closed, and until then data in the current block being written is not visible to other redaers.

As described in the documentation, Flume HDFS sink has several file closing strategies:

each N seconds (specified by rollInterval option)
after writing N bytes (rollSize option)
after writing N received events (rollCount option)
after N seconds of inactivity (idleTimeout option)

So, to your questions:

a) Flume writes events to currently opened file until it is closed (and new file opened).

b) Append is allowed in HDFS, but Flume does not use it. After file is closed, Flume does not append to it any data.

c) To hide currently opened file from mapreduce application use inUsePrefix option - all files with name that starts with . is not visible to MR jobs.

193

answered Sep 19 '22 00:09

Dmitry

Related questions
                            
                                How to use Mahout in a Windows environment?
                            
                                Java or Python distributed compute job (on a student budget)?
                            
                                Can I get invidually sorted Mapper outputs from Hadoop when using zero Reducers?
                            
                                Hadoop Streaming Job Failed (Not Successful) in Python
                            
                                Hadoop seems to modify my key object during an iteration over values of a given reduce call
                            
                                rsync files to hadoop
                            
                                NullPointerException from Hadoop's JobSplitWriter / SerializationFactory when calling InputSplit's getClass()
                            
                                Enum value implementing Writable interface of Hadoop
                            
                                Doubts about page rank
                            
                                Merging two datasets in Pig
                            
                                Hbase Region server shutdown
                            
                                Can I rename the oozie job name dynamically
                            
                                Hadoop MapReduce, Java implementation questions
                            
                                how to attach debugger to remote Hadoop instance
                            
                                Error connecting: <class 'thrift.transport.TTransport.TTransportException'> Could not connect to localhost:21000
                            
                                What to use.. Impala on HDFS, or Impala on Hbase or just the Hbase?
                            
                                Pyspark --py-files doesn't work
                            
                                Viewing the number of blocks for a file in hadoop

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Flume NG and HDFS

Tags:

hadoop

hdfs

flume

user1103472

People also ask

1 Answers

Dmitry

Recent Activity

Donate For Us