Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert Julia DataFrame to an array of bytes for compression

So I loaded two datasets from a csv and then merged them using a leftjoin:

using CSV
using DataFrames
using CodecZstd

df1 = CSV.read(joinpath(root, "data", "raw", "df1.csv"), DataFrame)
df2 = CSV.read(joinpath(root, "data", "raw", "df2.csv"), DataFrame)

merged = leftjoin(df1, df2, on=:id)

Now I want to write the merged dataframe to disk as a .zst compressed file (Zstandard compression).

I was successful in first writing to .csv then reading then writing again as .zst but is there a way to directly convert a DataFrame into an array of bytes to be able to save to disk?

like image 424
psych0groov3 Avatar asked Dec 30 '25 21:12

psych0groov3


1 Answers

To follow precisely your questions you can do:

using CSV, DataFrames, CodecZstd
fout = ZstdCompressorStream(open("z.zst","w"))
df = DataFrame(a='a':'h', b=1:8)
CSV.write(df ,fout)
close(fout)

Now this can be read as:

julia> CSV.read(ZstdDecompressorStream(open("z.zst")), DataFrame)
8×2 DataFrame
 Row │ a        b
     │ String1  Int64
─────┼────────────────
   1 │ a            1
   2 │ b            2
   3 │ c            3
   4 │ d            4
   5 │ e            5
   6 │ f            6
   7 │ g            7
   8 │ h            8

Other reasonable option would be to use Apache Arrow to write the DataFrame instead of CSV. The compression would compose in the same ways as above.

like image 73
Przemyslaw Szufel Avatar answered Jan 04 '26 20:01

Przemyslaw Szufel



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!