Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I convert local ORC files to CSV?

Tags:

csv

orc

I have an ORC file on my local machine and I need any reasonable format from it (e.g. CSV, JSON, YAML, ...).

How can I convert ORC to CSV?

like image 556
Martin Thoma Avatar asked Feb 01 '19 15:02

Martin Thoma


People also ask

How do I convert a normal file to CSV?

For PC, head to the "File" menu, and choose "Save as". You'll type in the name of your file, and add the extension ". csv". This will automatically change it to CSV format!

How do I open ORC files?

You need a suitable software like Digital Orchestrator Plus to open an ORC file. Without proper software you will receive a Windows message "How do you want to open this file?" or "Windows cannot open this file" or a similar Mac/iPhone/Android alert.

What is a ORC file?

The Optimized Row Columnar (ORC) file format provides a highly efficient way to store Hive data. It was designed to overcome limitations of the other Hive file formats. Using ORC files improves performance when Hive is reading, writing, and processing data.


1 Answers

  1. Download
  2. Extract the files, go to the java folder and execute maven: mvn install
  3. Use ORC-Tools

This is how I use them - you will likely need to adjust the paths:

java -jar ~/.m2/repository/org/apache/orc/orc-tools/1.5.4/orc-tools-1.5.4-uber.jar data ~/your_file.orc > output.json

The output is JSON Lines which is easy to convert to CSV. First I needed to remove the last two lines from the output. Then:

import pandas as pd

df = pd.read_json('output.json', lines=True)
df.to_csv('output.csv')
like image 159
Martin Thoma Avatar answered Oct 14 '22 16:10

Martin Thoma