I have two questions regarding the use of flatbuffers in python, that focus around how to use them the right way without writing code that utterly defeats its performance advantage. I want to use flatbuffers for serialization and network communication between a C# and python program. I have read the tutorial, python specifics and some blogposts that use other languages with flatbuffers but couldn't find one for python.
1.) Flatbuffers are for fast serialization. Is this even true for python? The performance for python just states "Ok" where other languages get "Great". Specific times are missing. I know that python is generally not as fast as C or C++ but how slow are we talking? To the point where it defeats it's promised performance advantage (for example compared to JSON)? Maybe someone already did a benchmark with python? If not, I will try to write one that compares times between C# and python and also flattbuffers vs json in python.
2.) It is fast, because of "zero copy". But what does that mean for a program that needs to alter the data? Especially since the objects are immutable. In order to work with them I need to copy the values into my local representation of the objects anyway. Doesn't defeat that the purpose? The tutorial states this example for reading from a flatbuffer:
import MyGame.Example as example
import flatbuffers
buf = open('monster.dat', 'rb').read()
buf = bytearray(buf)
monster = example.GetRootAsMonster(buf, 0)
hp = monster.Hp()
pos = monster.Pos()
Aren't those last two lines just copies?
The design of FlatBuffers heavily favors languages like C/C++/Rust in attaining maximum speed. The Python implementation mimics what these languages do, but it is very unnatural for Python, so it is not the fastest possible serializer design that you would get if you designed purely for Python.
I haven't benchmarked anything on Python, but a Python specific design would certainly beat FlatBuffers-Python in many cases. One case where the FlatBuffers design will win even in Python is for large files that are accessed sparsely or randomly, since it doesn't actually unpack all the data at once.
You typically use FlatBuffers because you have the performance critical part of your stack in a faster language, and then you also want to be able to process the data in Python elsewhere. If you work purely in Python however, FlatBuffers is possibly not your best pick (unless, again, you work with large sparse data).
Better of course is to not do your heavy lifting in Python in the first place.
I now made a benchmark in python to compare JSON and flatbuffers and think the answer could benefit someone, so here we go:
The setup is as follows: We got a client server architecture (on the same machine), both in python with sockets and asyncio. The test data is a big dictionary with values as strings, numbers and lists that contain other dictionaries also with string, number and list values. This tree gets max 3 levels deep with around 100 objects per list.
The flatbuffer schema uses tables for the dicts, vectors for the lists and structs for dicts that only use float and int fields.
The test data for the flatbuffer test is:
The test data for the JSON test is:
I know, that there are some points one could argue about, regarding the setup. For example, not transforming the data back into a dict in the flatbuffer test. If someone is actually interested in this I could advance on this test.
But now on to the results:
--- flatbuffers ---
roundtrip time: 7.010654926300049
serialize time: 6.960820913314819
deserialize time: 0.0
size in byte: 6.365.432
--- json ---
roundtrip time: 0.7860651016235352
serialize time: 0.3211710453033447
deserialize time: 0.28783655166625977
size in byte: 13.946.172
My conclusion is, that one should not use flatbuffers in python, if you want to edit or create the data fast. There is no way to mutate the data in python, meaning that you would have to rebuilt the flatbuffer every time something changes and this is very slow.
On the bright side, it is very fast to read the data and the byte size is very low compared to JSON. So if you have static data that you want to send or read many times, flatbuffers would be the solution.
You do not refer to any specific link. I guess the performance of flatbuffers
is going to be dependent on the serialization from Python while calling the API. Python is known to be slower than, say, C or C++ in that.
Regarding zero-copy - Google (and Wikipedia) is your friend.
Tutorial says "depending on language". What you are saying suggests that in Python you won't get exceptions.
What the documentation say? Do your experiments confirm it? (show us some effort of solving the problem)
Hard to say. What have you tried and what results have you got?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With