I found _datetimemodule.c
which seems to be the right file, but I need a bit of help as C is not my strength.
>>> import datetime
>>> import sys
>>> d = datetime.datetime.now()
>>> sys.getsizeof(d)
48
>>> d = datetime.datetime(2018, 12, 31, 23, 59, 59, 123)
>>> sys.getsizeof(d)
48
So a timezone-unaware datetime object nees 48 Bytes. Looking at the PyDateTime_DateTimeType
, it seems to be a PyDateTime_DateType
and a PyDateTime_TimeType
. Maybe also _PyDateTime_BaseTime
?
From looking at the code, I have the impression that one component is stored for each field in YYYY-mm-dd HH:MM:ss
, meaning:
int16_t
would be 16 bit)int8_t
int8_t
int8_t
int8_t
int8_t
uint16_t
But that would be 2*16 + 5 * 8 = 72 Bit = 9 Byte and not 48 Byte as Python tells me.
Where is my assumption about the internal structure of datetime wrong? How can I see this in the code?
(I guess this might differ between Python implementations - if so, please focus on cPython)
You're missing a key part of the picture: the actual datetime struct definitions, which lie in Include/datetime.h
. There are also important comments in there. Here are some key excerpts:
/* Fields are packed into successive bytes, each viewed as unsigned and
* big-endian, unless otherwise noted:
*
* byte offset
* 0 year 2 bytes, 1-9999
* 2 month 1 byte, 1-12
* 3 day 1 byte, 1-31
* 4 hour 1 byte, 0-23
* 5 minute 1 byte, 0-59
* 6 second 1 byte, 0-59
* 7 usecond 3 bytes, 0-999999
* 10
*/
...
/* # of bytes for year, month, day, hour, minute, second, and usecond. */
#define _PyDateTime_DATETIME_DATASIZE 10
...
/* The datetime and time types have hashcodes, and an optional tzinfo member,
* present if and only if hastzinfo is true.
*/
#define _PyTZINFO_HEAD \
PyObject_HEAD \
Py_hash_t hashcode; \
char hastzinfo; /* boolean flag */
...
/* All datetime objects are of PyDateTime_DateTimeType, but that can be
* allocated in two ways too, just like for time objects above. In addition,
* the plain date type is a base class for datetime, so it must also have
* a hastzinfo member (although it's unused there).
*/
...
#define _PyDateTime_DATETIMEHEAD \
_PyTZINFO_HEAD \
unsigned char data[_PyDateTime_DATETIME_DATASIZE];
typedef struct
{
_PyDateTime_DATETIMEHEAD
} _PyDateTime_BaseDateTime; /* hastzinfo false */
typedef struct
{
_PyDateTime_DATETIMEHEAD
unsigned char fold;
PyObject *tzinfo;
} PyDateTime_DateTime; /* hastzinfo true */
Additionally, note the following lines in Modules/_datetimemodule.c
:
static PyTypeObject PyDateTime_DateTimeType = {
PyVarObject_HEAD_INIT(NULL, 0)
"datetime.datetime", /* tp_name */
sizeof(PyDateTime_DateTime), /* tp_basicsize */
That tp_basicsize
line says sizeof(PyDateTime_DateTime)
, not sizeof(_PyDateTime_BaseDateTime)
, and the type doesn't implement any special __sizeof__
handling. That means the datetime.datetime
type reports its instance size as the size of a time-zone aware datetime, even for unaware instances.
The 48-byte count you're seeing breaks down as follows:
unsigned char[10]
containing datetime dataThis is true even though the actual memory layout of your unaware instance doesn't have a fold flag or tzinfo pointer.
This is, of course, all implementation details. It may be different on a different Python implementation, or a different CPython version, or a 32-bit CPython build, or a CPython debug build (there's extra stuff in the PyObject_HEAD when CPython is compiled with Py_TRACE_REFS defined).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With