Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Python store datetime internally?

I found _datetimemodule.c which seems to be the right file, but I need a bit of help as C is not my strength.

>>> import datetime
>>> import sys
>>> d = datetime.datetime.now()
>>> sys.getsizeof(d)
48
>>> d = datetime.datetime(2018, 12, 31, 23, 59, 59, 123)
>>> sys.getsizeof(d)
48

So a timezone-unaware datetime object nees 48 Bytes. Looking at the PyDateTime_DateTimeType, it seems to be a PyDateTime_DateType and a PyDateTime_TimeType. Maybe also _PyDateTime_BaseTime?

From looking at the code, I have the impression that one component is stored for each field in YYYY-mm-dd HH:MM:ss, meaning:

  • Year: e.g. int (e.g int16_t would be 16 bit)
  • Month: e.g int8_t
  • day: e.g. int8_t
  • Hour: e.g. int8_t
  • Minute: e.g. int8_t
  • Second: e.g. int8_t
  • Microsecond: e.g. uint16_t

But that would be 2*16 + 5 * 8 = 72 Bit = 9 Byte and not 48 Byte as Python tells me.

Where is my assumption about the internal structure of datetime wrong? How can I see this in the code?

(I guess this might differ between Python implementations - if so, please focus on cPython)

like image 971
Martin Thoma Avatar asked Oct 20 '25 02:10

Martin Thoma


1 Answers

You're missing a key part of the picture: the actual datetime struct definitions, which lie in Include/datetime.h. There are also important comments in there. Here are some key excerpts:

/* Fields are packed into successive bytes, each viewed as unsigned and
 * big-endian, unless otherwise noted:
 *
 * byte offset
 *  0           year     2 bytes, 1-9999
 *  2           month    1 byte, 1-12
 *  3           day      1 byte, 1-31
 *  4           hour     1 byte, 0-23
 *  5           minute   1 byte, 0-59
 *  6           second   1 byte, 0-59
 *  7           usecond  3 bytes, 0-999999
 * 10
 */

...

/* # of bytes for year, month, day, hour, minute, second, and usecond. */
#define _PyDateTime_DATETIME_DATASIZE 10

...

/* The datetime and time types have hashcodes, and an optional tzinfo member,
 * present if and only if hastzinfo is true.
 */
#define _PyTZINFO_HEAD          \
    PyObject_HEAD               \
    Py_hash_t hashcode;         \
    char hastzinfo;             /* boolean flag */

...

/* All datetime objects are of PyDateTime_DateTimeType, but that can be
 * allocated in two ways too, just like for time objects above.  In addition,
 * the plain date type is a base class for datetime, so it must also have
 * a hastzinfo member (although it's unused there).
 */

...

#define _PyDateTime_DATETIMEHEAD        \
    _PyTZINFO_HEAD                      \
    unsigned char data[_PyDateTime_DATETIME_DATASIZE];

typedef struct
{
    _PyDateTime_DATETIMEHEAD
} _PyDateTime_BaseDateTime;     /* hastzinfo false */

typedef struct
{
    _PyDateTime_DATETIMEHEAD
    unsigned char fold;
    PyObject *tzinfo;
} PyDateTime_DateTime;          /* hastzinfo true */

Additionally, note the following lines in Modules/_datetimemodule.c:

static PyTypeObject PyDateTime_DateTimeType = {
    PyVarObject_HEAD_INIT(NULL, 0)
    "datetime.datetime",                        /* tp_name */
    sizeof(PyDateTime_DateTime),                /* tp_basicsize */

That tp_basicsize line says sizeof(PyDateTime_DateTime), not sizeof(_PyDateTime_BaseDateTime), and the type doesn't implement any special __sizeof__ handling. That means the datetime.datetime type reports its instance size as the size of a time-zone aware datetime, even for unaware instances.

The 48-byte count you're seeing breaks down as follows:

  • 8-byte refcount
  • 8-byte type pointer
  • 8-byte cached hash
  • 1-byte "hastzinfo" flag
  • 10-byte manually packed unsigned char[10] containing datetime data
  • 1-byte "fold" flag (DST-related)
  • 4-byte padding, to align the tzinfo pointer
  • 8-byte tzinfo pointer

This is true even though the actual memory layout of your unaware instance doesn't have a fold flag or tzinfo pointer.

This is, of course, all implementation details. It may be different on a different Python implementation, or a different CPython version, or a 32-bit CPython build, or a CPython debug build (there's extra stuff in the PyObject_HEAD when CPython is compiled with Py_TRACE_REFS defined).

like image 115
user2357112 supports Monica Avatar answered Oct 21 '25 19:10

user2357112 supports Monica



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!