I'm using impala with flume as filestream.
The problem is flume is adding temporary files with extension .tmp, and then when they are deleted impala queries are failing with the following message:
Backend 0:Failed to open HDFS file hdfs://localhost:8020/user/hive/../FlumeData.1420040201733.tmp Error(2): No such file or directory
How can I make impala to ignore this tmp files, or flume not to write them, or write them to another directory?
Flume configuration:
### Agent2 - Avro Source and File Channel, hdfs Sink ###
# Name the components on this agent
Agent2.sources = avro-source
Agent2.channels = file-channel
Agent2.sinks = hdfs-sink
# Describe/configure Source
Agent2.sources.avro-source.type = avro
Agent2.sources.avro-source.hostname = 0.0.0.0
Agent2.sources.avro-source.port = 11111
Agent2.sources.avro-source.bind = 0.0.0.0
# Describe the sink
Agent2.sinks.hdfs-sink.type = hdfs
Agent2.sinks.hdfs-sink.hdfs.path = hdfs://localhost:8020/user/hive/table/
Agent2.sinks.hdfs-sink.hdfs.rollInterval = 0
Agent2.sinks.hdfs-sink.hdfs.rollCount = 10000
Agent2.sinks.hdfs-sink.hdfs.fileType = DataStream
#Use a channel which buffers events in file
Agent2.channels.file-channel.type = file
Agent2.channels.file-channel.checkpointDir = /home/ubutnu/flume/checkpoint/
Agent2.channels.file-channel.dataDirs = /home/ubuntu/flume/data/
# Bind the source and sink to the channel
Agent2.sources.avro-source.channels = file-channel
Agent2.sinks.hdfs-sink.channel = file-channel
I had this problem once.
I've upgraded hadoop and flume and it got solved. (from cloudera hadoop cdh-5.2 into cdh-5.3)
Try upgrading - hadoop, flume or impala.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With