Streaming frameworks on top of Hadoop that support ORC, parquet file formats [closed]

Question

Does Hadoop streaming support the new columnar storage formats like ORC and parquet or are there frameworks on top of Hadoop that allows you to read such formats?

user3614890 · Accepted Answer

You can use HCatalog to read ORC File. https://cwiki.apache.org/confluence/display/Hive/HCatalog+UsingHCat

It provides you an abstraction to read ORC, Text, Sequence, RC files. I am not sure if there is support of parquet there. Nonetheless if this doesn't sound reasonable, you can use ORC record readers in the Hive code base to read ORC Files (ORCInputFormat, ORCOutputFormat).

user3134802 · Answer

Rather old news, but I struggled with this some time ago. I did not found any solution for this so, as a result, I've made a set of input/output formats that convert avro and parquet files to/from plain text and json. It can be found at http://github.com/whale2/iow-hadoop-streaming. There's no ORC support, but Avro and Parquet are supported. Hope this helps.

Streaming frameworks on top of Hadoop that support ORC, parquet file formats [closed]

Tags:

hadoop

hive

mapreduce

hadoop-streaming

viper

2 Answers

user3614890

user3134802

Recent Activity

Donate For Us

Streaming frameworks on top of Hadoop that support ORC, parquet file formats [closed]

Tags:

hadoop

hive

mapreduce

hadoop-streaming

viper

2 Answers

user3614890

user3134802

Related questions

Recent Activity

Donate For Us