I want to use bcolz to save a pandas dataframe to file.
I have tried:
import bcolz
import pandas as pd
df = pd.read_csv(open("mydata.csv", 'rb'), delimiter='\t')
ct = bcolz.ctable.fromdataframe(df)
After that, ct
contains the compressed dataframe, but I can't find how I can save it to file.
You simply need to specify where to create the table when you read in the dataframe, like so:
import bcolz
import pandas as pd
df = pd.read_csv(open("mydata.csv", 'rb'), delimiter='\t')
ct = bcolz.ctable.fromdataframe(df, rootdir='dataframe.bcolz')
You can use bcolz with persistent data containers exactly in the same way than in-memory ones. You may want to have a look at this tutorial which works with datasets on disk using pandas/HDF5, pure PyTables, SQLite and bcolz:
https://github.com/FrancescAlted/EuroPython2015/blob/master/4-On-Disk-Tables.ipynb
It looks like bcolz.ctable
has a tohdf5
method which you could use; however you will need to install hdf5, pytables, etc. Otherwise you could use pickle
, which is the usual way to save a generic Python object to disk.
By the way if you're just interested in compressing your data, you might want to look at a more low-tech option like gzip
; the compression will be just as good if not better than a columnar data format, which is more concerned with doing fast queries against your data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With