Convert Julia DataFrame to an array of bytes for compression

Question

So I loaded two datasets from a csv and then merged them using a leftjoin:

using CSV
using DataFrames
using CodecZstd

df1 = CSV.read(joinpath(root, "data", "raw", "df1.csv"), DataFrame)
df2 = CSV.read(joinpath(root, "data", "raw", "df2.csv"), DataFrame)

merged = leftjoin(df1, df2, on=:id)

Now I want to write the merged dataframe to disk as a .zst compressed file (Zstandard compression).

I was successful in first writing to .csv then reading then writing again as .zst but is there a way to directly convert a DataFrame into an array of bytes to be able to save to disk?

Przemyslaw Szufel · Accepted Answer

To follow precisely your questions you can do:

using CSV, DataFrames, CodecZstd
fout = ZstdCompressorStream(open("z.zst","w"))
df = DataFrame(a='a':'h', b=1:8)
CSV.write(df ,fout)
close(fout)

Now this can be read as:

julia> CSV.read(ZstdDecompressorStream(open("z.zst")), DataFrame)
8×2 DataFrame
 Row │ a        b
     │ String1  Int64
─────┼────────────────
   1 │ a            1
   2 │ b            2
   3 │ c            3
   4 │ d            4
   5 │ e            5
   6 │ f            6
   7 │ g            7
   8 │ h            8

Other reasonable option would be to use Apache Arrow to write the DataFrame instead of CSV. The compression would compose in the same ways as above.

Convert Julia DataFrame to an array of bytes for compression

Tags:

julia

dataframes.jl

psych0groov3

1 Answers

Przemyslaw Szufel

Recent Activity

Donate For Us

Convert Julia DataFrame to an array of bytes for compression

Tags:

julia

dataframes.jl

psych0groov3

1 Answers

Przemyslaw Szufel

Related questions

Recent Activity

Donate For Us