Saving pandas dataframe to file using bcolz

Question

I want to use bcolz to save a pandas dataframe to file.

I have tried:

import bcolz
import pandas as pd

df = pd.read_csv(open("mydata.csv", 'rb'), delimiter='	')
ct = bcolz.ctable.fromdataframe(df)

After that, ct contains the compressed dataframe, but I can't find how I can save it to file.

Jeff · Accepted Answer

You simply need to specify where to create the table when you read in the dataframe, like so:

import bcolz
import pandas as pd

df = pd.read_csv(open("mydata.csv", 'rb'), delimiter='	')
ct = bcolz.ctable.fromdataframe(df, rootdir='dataframe.bcolz')

Francesc · Answer

You can use bcolz with persistent data containers exactly in the same way than in-memory ones. You may want to have a look at this tutorial which works with datasets on disk using pandas/HDF5, pure PyTables, SQLite and bcolz:

https://github.com/FrancescAlted/EuroPython2015/blob/master/4-On-Disk-Tables.ipynb

maxymoo · Answer

It looks like bcolz.ctable has a tohdf5 method which you could use; however you will need to install hdf5, pytables, etc. Otherwise you could use pickle, which is the usual way to save a generic Python object to disk.

By the way if you're just interested in compressing your data, you might want to look at a more low-tech option like gzip; the compression will be just as good if not better than a columnar data format, which is more concerned with doing fast queries against your data.

Saving pandas dataframe to file using bcolz

Tags:

python

pandas

M. Page

3 Answers

Jeff

Francesc

maxymoo

Recent Activity

Donate For Us

Saving pandas dataframe to file using bcolz

Tags:

python

pandas

M. Page

3 Answers

Jeff

Francesc

maxymoo

Related questions

Recent Activity

Donate For Us