A Pandas DataFrame can be converted to a hdf5 file like this;
df.to_hdf('test_store.hdf','test',mode='w')
I have an sqlite db file which has to be converted to a hdf5 file and then I would read the hdf5 file through pandas using pd.read_hdf
.
But first how do I convert a python sqlite db to a hdf5 file ?
EDIT:
I am aware of using the .read_sql
method in pandas. But I would like to convert the db to hdf5 first.
This is surprisingly simple: Use pandas!
pandas supports reading data directly from a SQL database into a DataFrame. Once you've got the DataFrame, you can do with it as you wish.
Short example, taken from the docs:
import sqlite3
from pandas.io import sql
# Create your connection.
cnx = sqlite3.connect('mydbfile.sqlite')
# read the result of the SQL query into a DataFrame
data = sql.read_sql("SELECT * FROM data;", cnx)
# now you can write it into a HDF5 file
data.to_hdf('test_store.hdf','test',mode='w')
Have a look at this ---
http://www.tutorialspoint.com/sqlite/sqlite_limit_clause.htm
The idea would be to iterate a select * from table
query and limit the results with an increasing offset. Write the results to the hdf5 data store as shown above. First count the number of entries with a select count(*) from table
and then split the iteration with into a managble chunks of this. e.g if there are 4million records read 200,000 at a time and increase the offet from 0, 200000, 400000 etc...
I need to do this to a very large sqlite file. will report if it works.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With