Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to see contents of Hive orc files in linux

Tags:

Is there a way to see the contents of an orc file that hive 0.11 and above use. I usually cat gz files and decompress them to see the contents eg: cat part-0000.gz | pigz -d | more Note: pigz is a parallel gz program.

I would like to know if there is something similar to this for orc files.

like image 696
viper Avatar asked Dec 30 '13 20:12

viper


People also ask

How do I view ORC files?

To read ORC files, use the OrcFile class to create a Reader that contains the metadata about the file. There are a few options to the ORC reader, but far fewer than the writer and none of them are required. The reader has methods for getting the number of rows, schema, compression, etc. from the file.

Does Hive support ORC file format?

The Optimized Row Columnar (ORC) file format provides a highly efficient way to store Hive data. It was designed to overcome limitations of the other Hive file formats. Using ORC files improves performance when Hive is reading, writing, and processing data.


1 Answers

Updated answer in year 2020:

Per @Owen's answer, ORC has grown up and matured as it's own Apache project. A completed list of ORC Adopters shows how prevalent it is now supported across many varieties of Big Data technologies.

Credit to @Owen and the ORC Apache project team, ORC's project site has a fully maintained up-to-date documentation on using either the Java or C++ stand alone tool on ORC file stored on a Linux local file system. Which carried on the torch for the original Hive+ORC Apache wiki page.

Original answer dated: May 30 '14 at 16:27

The ORC file dump utility comes with hive (0.11 or higher):

hive --orcfiledump <hdfs-location-of-orc-file>

Source link

like image 200
geekyj Avatar answered Sep 18 '22 13:09

geekyj