Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I add a new column without rewriting an entire file?

Tags:

apache-arrow

I've been experimenting with Apache Arrow. I have used the column oriented memory mapped files for many years. In the past, I've used a separate file for each column. Arrow seems to like to store everything in one file. Is there a way to add a new column without rewriting the entire file?

like image 278
Kevin Atteson Avatar asked Nov 24 '25 19:11

Kevin Atteson


1 Answers

The short answer is probably no.

Arrow's in-memory format & libraries support this. You can add a chunked array to a table by just creating a new table (this should be zero-copy).

However, it appears you are talking about storing tables in files. None of the common file formats in use (parquet, csv, feather) support partitioning a table in this way.

Keep in mind, if you are reading a parquet file, you can specify which column(s) you want to read and it will only read the necessary data. So if your goal is only to support individual column retrieval/query then you can just build one large table with all your columns.

like image 104
Pace Avatar answered Nov 28 '25 15:11

Pace



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!