I have a large Pandas DataFrame in Python that I would like to access in a Julia program (as a Julia DataFrames.DataFrame object). As I would like to avoid writing to disk for each file send from Python to Julia, it seems as though storing the DataFrame in an Apache Arrow/Feather file in a buffer and sending that via TCP from python to Julia is ideal.
I have tried extensively but cannot figure out how to
Thanks for your help.
Hmmm, good question. I'm not sure using a TCP socket is necessarily the easiest, since you need one end to be the "server" socket and the other to be the client. So typically the TCP flow is: 1) server binds and listens to a port, 2) server calls to "accept" a new connection, 3) client calls "connect" on the port to initialize connection, 4) once server accepts, the connection is established, then server/client can write data to each other over connected socket.
I've had success doing something similar to what you've described by using mmapped files, though maybe you have a hard requirement to not touch disk at all. This works nicely though because both the python and Julia processes just "share" the mmapped file.
Another approach you could check out is what I setup to do "round trip" testing in the Arrow.jl Julia package: https://github.com/apache/arrow-julia/blob/main/test/pyarrow_roundtrip.jl. It's setup to use PyCall.jl from Julia to share the bytes between python and Julia.
Hope that helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With