Store and extract numpy datetimes in PyTables

Question

I want to store numpy datetime64 data in a PyTables Table. I want to do this without using Pandas.

What I've tried so far

Setup

In [1]: import tables as tb
In [2]: import numpy as np
In [3]: from datetime import datetime

create data

In [4]: data = [(1, datetime(2000, 1, 1, 1, 1, 1)), (2, datetime(2001, 2, 2, 2, 2, 2))]
In [5]: rec = np.array(data, dtype=[('a', 'i4'), ('b', 'M8[us]')])
In [6]: rec  # a numpy array with my data
Out[6]: 
array([(1, datetime.datetime(2000, 1, 1, 1, 1, 1)),
       (2, datetime.datetime(2001, 2, 2, 2, 2, 2))], 
      dtype=[('a', '<i4'), ('b', '<M8[us]')])

Open PyTables dataset with `Time64Col` descriptor

In [7]: f = tb.open_file('foo.h5', 'w')  # New PyTables file
In [8]: d = f.create_table('/', 'bar', description={'a': tb.Int32Col(pos=0), 
                                                    'b': tb.Time64Col(pos=1)})
In [9]: d
Out[9]: 
/bar (Table(0,)) ''
  description := {
  "a": Int32Col(shape=(), dflt=0, pos=0),
  "b": Time64Col(shape=(), dflt=0.0, pos=1)}
  byteorder := 'little'
  chunkshape := (5461,)

Append NumPy data to PyTables dataset

In [10]: d.append(rec)
In [11]: d
Out[11]: 
/bar (Table(2,)) ''
  description := {
  "a": Int32Col(shape=(), dflt=0, pos=0),
  "b": Time64Col(shape=(), dflt=0.0, pos=1)}
  byteorder := 'little'
  chunkshape := (5461,)

What happened to my datetimes?

In [12]: d[:]
Out[12]: 
array([(1, 0.0), (2, 0.0)], 
      dtype=[('a', '<i4'), ('b', '<f8')])

I understand that HDF5 doesn't provide native support for datetimes. I would expect that the extra metadata that PyTables overlays would handle this though.

My Question

How can I store a numpy record array that contains datetimes in PyTables? How can I efficiently extract that data from a PyTables table back to a NumPy array and retain my datetimes?

Common answer

I commonly get this answer:

Use Pandas

I don't want to use Pandas because I don't have an index, I don't want one stored in my dataset, and Pandas doesn't allow you to not have/store an index (see this question)

Phillip Cloud · Accepted Answer

First, when putting the values into a Time64Col, they need to be float64s. You can do this with a call to astype, like so:

new_rec = rec.astype([('a', 'i4'), ('b', 'f8')])

Then you need to convert column b to seconds since the epoch which means you need to divide by 1,000,000 since we're in microseconds:

new_rec['b'] = new_rec['b'] / 1e6

Then call d.append(new_rec)

When you read the array back into memory, do the reverse and multiply by 1,000,000. You'll have to make sure things are in microseconds before putting anything in, which is handled automatically by astype('datetime64[us]') in numpy >= 1.7.x

I used the solution from this question: How to get unix timestamp from numpy.datetime64

Here's a working version of your example:

In [4]: data = [(1, datetime(2000, 1, 1, 1, 1, 1)), (2, datetime(2001, 2, 2, 2, 2, 2))]

In [5]: rec = np.array(data, dtype=[('a', 'i4'), ('b', 'M8[us]')])

In [6]: new_rec = rec.astype([('a', 'i4'), ('b', 'f8')])

In [7]: new_rec
Out[7]:
array([(1, 946688461000000.0), (2, 981079322000000.0)],
      dtype=[('a', '<i4'), ('b', '<f8')])

In [8]: new_rec['b'] /= 1e6

In [9]: new_rec
Out[9]:
array([(1, 946688461.0), (2, 981079322.0)],
      dtype=[('a', '<i4'), ('b', '<f8')])

In [10]: f = tb.open_file('foo.h5', 'w')  # New PyTables file

In [11]: d = f.create_table('/', 'bar', description={'a': tb.Int32Col(pos=0),
   ....:                                             'b': tb.Time64Col(pos=1)})

In [12]: d.append(new_rec)

In [13]: d[:]
Out[13]:
array([(1, 946688461.0), (2, 981079322.0)],
      dtype=[('a', '<i4'), ('b', '<f8')])

In [14]: r = d[:]

In [15]: r['b'] *= 1e6

In [16]: r.astype([('a', 'i4'), ('b', 'datetime64[us]')])
Out[16]:
array([(1, datetime.datetime(2000, 1, 1, 1, 1, 1)),
       (2, datetime.datetime(2001, 2, 2, 2, 2, 2))],
      dtype=[('a', '<i4'), ('b', '<M8[us]')])

Store and extract numpy datetimes in PyTables

Tags:

python

datetime

numpy

pytables

What I've tried so far

Setup

create data

Open PyTables dataset with `Time64Col` descriptor

Append NumPy data to PyTables dataset

What happened to my datetimes?

My Question

Common answer

MRocklin

1 Answers

Phillip Cloud

Recent Activity

Donate For Us

Store and extract numpy datetimes in PyTables

Tags:

python

datetime

numpy

pytables

What I've tried so far

Setup

create data

Open PyTables dataset with Time64Col descriptor

Append NumPy data to PyTables dataset

What happened to my datetimes?

My Question

Common answer

MRocklin

1 Answers

Phillip Cloud

Related questions

Recent Activity

Donate For Us

Open PyTables dataset with `Time64Col` descriptor