Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you create a compressed dataset in pytables that can store a Unicode string?

I'm using PyTables to store a data array, which works fine; along with it I need to store a moderately large (50K-100K) Unicode string containing JSON data, and I'd like to compress it.

How can I do this in PyTables? It's been a long time since I've worked with HDF5, and I can't remember the right way to store character arrays so they can be compressed. (And I can't seem to find a similar example of doing this on the PyTables website.)

like image 555
Jason S Avatar asked Jan 14 '14 23:01

Jason S


1 Answers

PyTables does not natively support unicode - yet. To store unicode. First convert the string to bytes and then store a VLArray of length-1 strings or uint8. To get compression simply instantiate your array with a Filters instance that has a non-zero complevel.

All of the examples I know of storing JSON data like this do so using the HDF5 C-API.

like image 165
Anthony Scopatz Avatar answered Sep 20 '22 23:09

Anthony Scopatz