Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Save flume output to hive table with Hive Sink

Tags:

hadoop

hive

flume

I am trying to configure flume with Hive to save flume output to hive table with Hive Sink type. I have single node cluster. I use mapr hadoop distribution.

Here is my flume.conf

agent1.sources = source1
agent1.channels = channel1
agent1.sinks = sink1

agent1.sources.source1.type = exec
agent1.sources.source1.command = cat /home/andrey/flume_test.data

agent1.sinks.sink1.type = hive
agent1.sinks.sink1.channel = channel1
agent1.sinks.sink1.hive.metastore = thrift://127.0.0.1:9083
agent1.sinks.sink1.hive.database = default
agent1.sinks.sink1.hive.table = flume_test
agent1.sinks.sink1.useLocalTimeStamp = false
agent1.sinks.sink1.round = true
agent1.sinks.sink1.roundValue = 10
agent1.sinks.sink1.roundUnit = minute
agent1.sinks.sink1.serializer = DELIMITED
agent1.sinks.sink1.serializer.delimiter = "," 
agent1.sinks.sink1.serializer.serdeSeparator = ','
agent1.sinks.sink1.serializer.fieldnames = id,message

agent1.channels.channel1.type = FILE
agent1.channels.channel1.transactionCapacity = 1000000
agent1.channels.channel1.checkpointInterval 30000
agent1.channels.channel1.maxFileSize = 2146435071
agent1.channels.channel1.capacity 10000000
agent1.sources.source1.channels = channel1

My data flume_test.data

1,AAAAAAAA
2,BBBBBBB
3,CCCCCCCC
4,DDDDDD
5,EEEEEEE
6,FFFFFFFFFFF
7,GGGGGG
8,HHHHHHH
9,IIIIII
10,JJJJJJ
11,KKKKKK
12,LLLLLLLL
13,MMMMMMMMM
14,NNNNNNNNN
15,OOOOOOOO
16,PPPPPPPPPP
17,QQQQQQQ
18,RRRRRRR
19,SSSSSSSS

Thats how I create my table in Hive

create table flume_test(id string, message string)
clustered by (message) into 1 buckets
STORED AS ORC tblproperties ("orc.compress"="NONE");

When I use only 1 bucket, select * from flume_test command in hive shell returns me only OK status, without data. If I use more then 1 bucket, it returns me error messages.

Errors for example with 5 buckets after hive table select:

hive> select * from flume_test;
OK
2015-06-18 10:04:57,6909 ERROR Client fs/client/fileclient/cc/client.cc:1385 Thread: 10141 Open failed for file /user/hive/warehouse/flume_test/delta_0004401_0004500/bucket_00, LookupFid error No such file or directory(2)
2015-06-18 10:04:57,6941 ERROR Client fs/client/fileclient/cc/client.cc:1385 Thread: 10141 Open failed for file /user/hive/warehouse/flume_test/delta_0004401_0004500/bucket_00, LookupFid error No such file or directory(2)
2015-06-18 10:04:57,6976 ERROR Client fs/client/fileclient/cc/client.cc:1385 Thread: 10141 Open failed for file /user/hive/warehouse/flume_test/delta_0004401_0004500/bucket_00, LookupFid error No such file or directory(2)
2015-06-18 10:04:57,7044 ERROR Client fs/client/fileclient/cc/client.cc:1385 Thread: 10141 Open failed for file /user/hive/warehouse/flume_test/delta_0004401_0004500/bucket_00, LookupFid error No such file or directory(2)
Time taken: 0.914 seconds

Hive table data saves in /user/hive/warehouse/flume_test directory and it is not empty.

-rwxr-xr-x   3 andrey andrey          4 2015-06-17 16:28 /user/hive/warehouse/flume_test/_orc_acid_version
drwxr-xr-x   - andrey andrey          2 2015-06-17 16:28 /user/hive/warehouse/flume_test/delta_0004301_0004400

delta directory contains

-rw-r--r--   3 andrey andrey        991 2015-06-17 16:28 /user/hive/warehouse/flume_test/delta_0004301_0004400/bucket_00000
-rwxr-xr-x   3 andrey andrey          8 2015-06-17 16:28 /user/hive/warehouse/flume_test/delta_0004301_0004400/bucket_00000_flush_length

I cant read /user/hive/warehouse/flume_test/delta_0004301_0004400/bucket_00000 orc file even with pig.

Also I tried to set this vars after table creation in hive, but this didn't give result.

set hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.compactor.initiator.on = true;
set hive.compactor.worker.threads = 2;

I found few examples in internet, but they are not full, and I am new to flume, so I cant understand them)

like image 960
Andrey Braslavskiy Avatar asked Jun 18 '15 07:06

Andrey Braslavskiy


1 Answers

Adding these 2 rows to my config solved my problem, but I still have errors when read table from hive. I can read the table, it returns correct result but with errors

agent1.sinks.sink1.hive.txnsPerBatchAsk = 2
agent1.sinks.sink1.batchSize = 10 
like image 156
Andrey Braslavskiy Avatar answered Nov 14 '22 20:11

Andrey Braslavskiy