I have data coming from a csv which has a few thousand columns and ten thousand (or so) rows. Within each column the data is of the same type, but different columns have data of different type*. Previously I have been pickling the data from numpy and storing on disk, but it's quite slow, especially because usually I want to load some subset of the columns rather than all of them.
I want to put the data into hdf5 using pytables, and my first approach was to put the data in a single table, with one hdf5 column per csv column. Unfortunately this didn't work, I assume because of the 512 (soft) column limit.
What is a sensible way to store this data?
* I mean, the type of the data after it has been converted from text.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With