Is there a way to see the contents of an orc file that hive 0.11 and above use. I usually cat gz files and decompress them to see the contents eg: cat part-0000.gz | pigz -d | more Note: pigz is a parallel gz program.
I would like to know if there is something similar to this for orc files.
To read ORC files, use the OrcFile class to create a Reader that contains the metadata about the file. There are a few options to the ORC reader, but far fewer than the writer and none of them are required. The reader has methods for getting the number of rows, schema, compression, etc. from the file.
The Optimized Row Columnar (ORC) file format provides a highly efficient way to store Hive data. It was designed to overcome limitations of the other Hive file formats. Using ORC files improves performance when Hive is reading, writing, and processing data.
Updated answer in year 2020:
Per @Owen's answer, ORC has grown up and matured as it's own Apache project. A completed list of ORC Adopters shows how prevalent it is now supported across many varieties of Big Data technologies.
Credit to @Owen and the ORC Apache project team, ORC's project site has a fully maintained up-to-date documentation on using either the Java or C++ stand alone tool on ORC file stored on a Linux local file system. Which carried on the torch for the original Hive+ORC Apache wiki page.
Original answer dated: May 30 '14 at 16:27
The ORC file dump utility comes with hive (0.11 or higher):
hive --orcfiledump <hdfs-location-of-orc-file>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With