I have an ORC file on my local machine and I need any reasonable format from it (e.g. CSV, JSON, YAML, ...).
How can I convert ORC to CSV?
For PC, head to the "File" menu, and choose "Save as". You'll type in the name of your file, and add the extension ". csv". This will automatically change it to CSV format!
You need a suitable software like Digital Orchestrator Plus to open an ORC file. Without proper software you will receive a Windows message "How do you want to open this file?" or "Windows cannot open this file" or a similar Mac/iPhone/Android alert.
The Optimized Row Columnar (ORC) file format provides a highly efficient way to store Hive data. It was designed to overcome limitations of the other Hive file formats. Using ORC files improves performance when Hive is reading, writing, and processing data.
java
folder and execute maven: mvn install
This is how I use them - you will likely need to adjust the paths:
java -jar ~/.m2/repository/org/apache/orc/orc-tools/1.5.4/orc-tools-1.5.4-uber.jar data ~/your_file.orc > output.json
The output is JSON Lines which is easy to convert to CSV. First I needed to remove the last two lines from the output. Then:
import pandas as pd
df = pd.read_json('output.json', lines=True)
df.to_csv('output.csv')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With