Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to append ORC file

We have a requirement where we need to appednd ORC files. I tried to google it but no result. Also org.apache.hadoop.hive.ql.io.orc.WriterImpl of ORC do not have the append API. Is there anyway to append the ORC files? (More specifically using JAVA)

like image 246
Sachin Avatar asked Sep 27 '22 20:09

Sachin


1 Answers

ORC data files are subdivised in independent stripes; each stripe be created in a single atomic step. See the official documentation for details.

I don't believe you can directly append to an existing file on-the-fly. That would mean leaving a corrupt stripe (hence a corrupt file) in case of a job crash while writing.

But you can

  • create a new ORC data file (which will contain 1..N stripes depending on actual data volume vs. orc.stripe.size property) per reducer
  • then "concatenate" these data files -- and existing file(s) -- using Hive V0.14 and above
like image 193
Samson Scharfrichter Avatar answered Oct 09 '22 22:10

Samson Scharfrichter