I'm using PyTables to store a data array, which works fine; along with it I need to store a moderately large (50K-100K) Unicode string containing JSON data, and I'd like to compress it.
How can I do this in PyTables? It's been a long time since I've worked with HDF5, and I can't remember the right way to store character arrays so they can be compressed. (And I can't seem to find a similar example of doing this on the PyTables website.)
PyTables does not natively support unicode - yet. To store unicode. First convert the string to bytes and then store a VLArray of length-1 strings or uint8. To get compression simply instantiate your array with a Filters
instance that has a non-zero complevel
.
All of the examples I know of storing JSON data like this do so using the HDF5 C-API.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With