Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Saving pandas dataframe to file using bcolz

Tags:

python

pandas

I want to use bcolz to save a pandas dataframe to file.

I have tried:

import bcolz
import pandas as pd

df = pd.read_csv(open("mydata.csv", 'rb'), delimiter='\t')
ct = bcolz.ctable.fromdataframe(df)

After that, ct contains the compressed dataframe, but I can't find how I can save it to file.

like image 327
M. Page Avatar asked Jul 26 '15 21:07

M. Page


3 Answers

You simply need to specify where to create the table when you read in the dataframe, like so:

import bcolz
import pandas as pd

df = pd.read_csv(open("mydata.csv", 'rb'), delimiter='\t')
ct = bcolz.ctable.fromdataframe(df, rootdir='dataframe.bcolz')
like image 84
Jeff Avatar answered Nov 19 '22 16:11

Jeff


You can use bcolz with persistent data containers exactly in the same way than in-memory ones. You may want to have a look at this tutorial which works with datasets on disk using pandas/HDF5, pure PyTables, SQLite and bcolz:

https://github.com/FrancescAlted/EuroPython2015/blob/master/4-On-Disk-Tables.ipynb

like image 28
Francesc Avatar answered Nov 19 '22 14:11

Francesc


It looks like bcolz.ctable has a tohdf5 method which you could use; however you will need to install hdf5, pytables, etc. Otherwise you could use pickle, which is the usual way to save a generic Python object to disk.

By the way if you're just interested in compressing your data, you might want to look at a more low-tech option like gzip; the compression will be just as good if not better than a columnar data format, which is more concerned with doing fast queries against your data.

like image 2
maxymoo Avatar answered Nov 19 '22 16:11

maxymoo