Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to store wide tables in pytables / hdf5

Tags:

I have data coming from a csv which has a few thousand columns and ten thousand (or so) rows. Within each column the data is of the same type, but different columns have data of different type*. Previously I have been pickling the data from numpy and storing on disk, but it's quite slow, especially because usually I want to load some subset of the columns rather than all of them.

I want to put the data into hdf5 using pytables, and my first approach was to put the data in a single table, with one hdf5 column per csv column. Unfortunately this didn't work, I assume because of the 512 (soft) column limit.

What is a sensible way to store this data?

* I mean, the type of the data after it has been converted from text.