Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

convert python sqlite db to hdf5

A Pandas DataFrame can be converted to a hdf5 file like this;

df.to_hdf('test_store.hdf','test',mode='w')

I have an sqlite db file which has to be converted to a hdf5 file and then I would read the hdf5 file through pandas using pd.read_hdf.

But first how do I convert a python sqlite db to a hdf5 file ?

EDIT:

I am aware of using the .read_sql method in pandas. But I would like to convert the db to hdf5 first.

like image 423
richie Avatar asked Apr 08 '14 11:04

richie


2 Answers

This is surprisingly simple: Use pandas!

pandas supports reading data directly from a SQL database into a DataFrame. Once you've got the DataFrame, you can do with it as you wish.

Short example, taken from the docs:

import sqlite3
from pandas.io import sql
# Create your connection.
cnx = sqlite3.connect('mydbfile.sqlite')

# read the result of the SQL query into a DataFrame
data = sql.read_sql("SELECT * FROM data;", cnx)

# now you can write it into a HDF5 file
data.to_hdf('test_store.hdf','test',mode='w')
like image 103
Carsten Avatar answered Sep 30 '22 09:09

Carsten


Have a look at this ---

http://www.tutorialspoint.com/sqlite/sqlite_limit_clause.htm

The idea would be to iterate a select * from table query and limit the results with an increasing offset. Write the results to the hdf5 data store as shown above. First count the number of entries with a select count(*) from table and then split the iteration with into a managble chunks of this. e.g if there are 4million records read 200,000 at a time and increase the offet from 0, 200000, 400000 etc...

I need to do this to a very large sqlite file. will report if it works.

like image 41
Tooblippe Avatar answered Sep 30 '22 10:09

Tooblippe