I want to store numpy datetime64
data in a PyTables Table
. I want to do this without using Pandas.
In [1]: import tables as tb
In [2]: import numpy as np
In [3]: from datetime import datetime
In [4]: data = [(1, datetime(2000, 1, 1, 1, 1, 1)), (2, datetime(2001, 2, 2, 2, 2, 2))]
In [5]: rec = np.array(data, dtype=[('a', 'i4'), ('b', 'M8[us]')])
In [6]: rec # a numpy array with my data
Out[6]:
array([(1, datetime.datetime(2000, 1, 1, 1, 1, 1)),
(2, datetime.datetime(2001, 2, 2, 2, 2, 2))],
dtype=[('a', '<i4'), ('b', '<M8[us]')])
Time64Col
descriptorIn [7]: f = tb.open_file('foo.h5', 'w') # New PyTables file
In [8]: d = f.create_table('/', 'bar', description={'a': tb.Int32Col(pos=0),
'b': tb.Time64Col(pos=1)})
In [9]: d
Out[9]:
/bar (Table(0,)) ''
description := {
"a": Int32Col(shape=(), dflt=0, pos=0),
"b": Time64Col(shape=(), dflt=0.0, pos=1)}
byteorder := 'little'
chunkshape := (5461,)
In [10]: d.append(rec)
In [11]: d
Out[11]:
/bar (Table(2,)) ''
description := {
"a": Int32Col(shape=(), dflt=0, pos=0),
"b": Time64Col(shape=(), dflt=0.0, pos=1)}
byteorder := 'little'
chunkshape := (5461,)
In [12]: d[:]
Out[12]:
array([(1, 0.0), (2, 0.0)],
dtype=[('a', '<i4'), ('b', '<f8')])
I understand that HDF5 doesn't provide native support for datetimes. I would expect that the extra metadata that PyTables overlays would handle this though.
How can I store a numpy record array that contains datetimes in PyTables? How can I efficiently extract that data from a PyTables table back to a NumPy array and retain my datetimes?
I commonly get this answer:
Use Pandas
I don't want to use Pandas because I don't have an index, I don't want one stored in my dataset, and Pandas doesn't allow you to not have/store an index (see this question)
First, when putting the values into a Time64Col
, they need to be float64
s. You can do this with a call to astype
, like so:
new_rec = rec.astype([('a', 'i4'), ('b', 'f8')])
Then you need to convert column b
to seconds since the epoch which means you need to divide by 1,000,000 since we're in microseconds:
new_rec['b'] = new_rec['b'] / 1e6
Then call d.append(new_rec)
When you read the array back into memory, do the reverse and multiply by 1,000,000. You'll have to make sure things are in microseconds before putting anything in, which is handled automatically by astype('datetime64[us]')
in numpy >= 1.7.x
I used the solution from this question: How to get unix timestamp from numpy.datetime64
Here's a working version of your example:
In [4]: data = [(1, datetime(2000, 1, 1, 1, 1, 1)), (2, datetime(2001, 2, 2, 2, 2, 2))]
In [5]: rec = np.array(data, dtype=[('a', 'i4'), ('b', 'M8[us]')])
In [6]: new_rec = rec.astype([('a', 'i4'), ('b', 'f8')])
In [7]: new_rec
Out[7]:
array([(1, 946688461000000.0), (2, 981079322000000.0)],
dtype=[('a', '<i4'), ('b', '<f8')])
In [8]: new_rec['b'] /= 1e6
In [9]: new_rec
Out[9]:
array([(1, 946688461.0), (2, 981079322.0)],
dtype=[('a', '<i4'), ('b', '<f8')])
In [10]: f = tb.open_file('foo.h5', 'w') # New PyTables file
In [11]: d = f.create_table('/', 'bar', description={'a': tb.Int32Col(pos=0),
....: 'b': tb.Time64Col(pos=1)})
In [12]: d.append(new_rec)
In [13]: d[:]
Out[13]:
array([(1, 946688461.0), (2, 981079322.0)],
dtype=[('a', '<i4'), ('b', '<f8')])
In [14]: r = d[:]
In [15]: r['b'] *= 1e6
In [16]: r.astype([('a', 'i4'), ('b', 'datetime64[us]')])
Out[16]:
array([(1, datetime.datetime(2000, 1, 1, 1, 1, 1)),
(2, datetime.datetime(2001, 2, 2, 2, 2, 2))],
dtype=[('a', '<i4'), ('b', '<M8[us]')])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With