Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does this query give different results depending on how I arrange my DateTime arithmetic?

I have used SqlAlchemy to create a table, Record. Each record has a field, date, which stores a DateTime. I want to find all records whose date is more recent than eight hours ago.

I came up with four ways to write a filter, all involving simple arithmetic comparing the current time, the record's time, and an eight hour timedelta. The problem is: half of these filters return rows outside of the eight hour window.

from sqlalchemy import Column, Integer, DateTime
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from sqlalchemy import create_engine
import datetime

Base = declarative_base()

class Record(Base):
    __tablename__ = 'record'
    id = Column(Integer, primary_key=True)
    date = Column(DateTime, nullable=False)

engine = create_engine('sqlite:///records.db')
Base.metadata.create_all(engine)
DBSession = sessionmaker(bind=engine)
session = DBSession()

#if the db is empty, add some records to the database with datetimes corresponding to one year ago and one hour ago and yesterday
now = datetime.datetime(2018, 4, 4, 10, 0, 0)
if not session.query(Record).all():
    session.add(Record(date = now - datetime.timedelta(days=365)))
    session.add(Record(date = now - datetime.timedelta(days=1)))
    session.add(Record(date = now - datetime.timedelta(hours=1)))


delta = datetime.timedelta(hours=8)

#these are all equivalent to "records from the last eight hours"
criterion = [
    (now - Record.date < delta),
    (Record.date > now - delta),
    (delta > now - Record.date),
    (now - delta < Record.date),
]

for idx, crit in enumerate(criterion):
    query = session.query(Record).filter(crit)
    print("\n\nApproach #{}.".format(idx))
    print("Generated statement:")
    print(query.statement)
    records = query.all()
    print("{} row(s) retrieved.".format(len(records)))
    for record in query.all():
        print(record.id, record.date)

Result:

Approach #0.
Generated statement:
SELECT record.id, record.date
FROM record
WHERE :date_1 - record.date < :param_1
3 row(s) retrieved.
1 2017-04-04 10:00:00
2 2018-04-03 10:00:00
3 2018-04-04 09:00:00


Approach #1.
Generated statement:
SELECT record.id, record.date
FROM record
WHERE record.date > :date_1
1 row(s) retrieved.
3 2018-04-04 09:00:00


Approach #2.
Generated statement:
SELECT record.id, record.date
FROM record
WHERE :date_1 - record.date < :param_1
3 row(s) retrieved.
1 2017-04-04 10:00:00
2 2018-04-03 10:00:00
3 2018-04-04 09:00:00


Approach #3.
Generated statement:
SELECT record.id, record.date
FROM record
WHERE record.date > :date_1
1 row(s) retrieved.
3 2018-04-04 09:00:00

Approaches 1 and 3 are correct - they return the record from one hour ago, and not the record from one day ago or one year ago. Approaches 0 and 2 are incorrect because they return the record from one day ago and the record from one year ago in addition to the record from one hour ago.

What is causing this discrepancy? I notice that #1 and #3 generate statements that only parameterize a single datetime object, while #0 and #2 parameterize both a datetime object and a timedelta object. Are timedeltas parameterized in an unusual way that would make them unsuitable for arithmetic of this kind?

like image 824
Kevin Avatar asked Apr 04 '18 15:04

Kevin


1 Answers

As noted by unutbu, when timedelta objects are used as bind parameters against databases that do not support a native Interval type, they're converted to timestamps relative to "epoch" (1. Jan 1970). SQLite is such a database, as is MySQL. Another notable thing when you turn on logging is that the datetime values are stored and passed as ISO formatted strings.

A DATETIME column has NUMERIC affinity in SQLite, but since the ISO formatted strings cannot be losslessly converted to a numeric value, they retain their TEXT storage class. This on the other hand is fine, since the 3 ways to store date and time data in SQLite are

  • TEXT as ISO8601 strings ("YYYY-MM-DD HH:MM:SS.SSS").
  • REAL as Julian day numbers, the number of days since noon in Greenwich on November 24, 4714 B.C. according to the proleptic Gregorian calendar.
  • INTEGER as Unix Time, the number of seconds since 1970-01-01 00:00:00 UTC.

Things become a bit more interesting when you try to perform arithmetic in the database, though:

In [18]: session.execute('SELECT :date_1 - record.date FROM record',
    ...:                 {"date_1": now}).fetchall()
2018-04-04 20:47:35,045 INFO sqlalchemy.engine.base.Engine SELECT ? - record.date FROM record
INFO:sqlalchemy.engine.base.Engine:SELECT ? - record.date FROM record
2018-04-04 20:47:35,045 INFO sqlalchemy.engine.base.Engine (datetime.datetime(2018, 4, 4, 10, 0),)
INFO:sqlalchemy.engine.base.Engine:(datetime.datetime(2018, 4, 4, 10, 0),)
Out[18]: [(1,), (0,), (0,)]

The reason is that all mathematical operators cast their operands to NUMERIC storage class, even if the resulting values are lossy – or don't make sense for that matter. In this case the year part is parsed and the rest ignored.

Since any INTEGER or REAL value is less than any TEXT or BLOB value, all the comparisons between the resulting integer values and given ISO formatted interval strings are true:

In [25]: session.execute(text('SELECT :date_1 - record.date < :param_1 FROM record')
    ...:                 .bindparams(bindparam('param_1', type_=Interval)),
    ...:                 {"date_1": now, "param_1": delta}).fetchall()
    ...:                 
2018-04-04 20:55:36,952 INFO sqlalchemy.engine.base.Engine SELECT ? - record.date < ? FROM record
INFO:sqlalchemy.engine.base.Engine:SELECT ? - record.date < ? FROM record
2018-04-04 20:55:36,952 INFO sqlalchemy.engine.base.Engine (datetime.datetime(2018, 4, 4, 10, 0), '1970-01-01 08:00:00.000000')
INFO:sqlalchemy.engine.base.Engine:(datetime.datetime(2018, 4, 4, 10, 0), '1970-01-01 08:00:00.000000')
Out[25]: [(1,), (1,), (1,)]

Some might call all this a leaky abstraction, but it'd be a daunting – or impossible – task to provide solutions in SQLAlchemy to all the differences between database implementations. Personally I find it handier that it does not get in the way, but allows using the database's features pretty much as is, but with a nice Python DSL. If you truly need to support time diffs in different databases in a single code base, create a custom construct with suitable database specific compilers.

To actually compute the difference in SQLite and compare against total seconds in a given timedelta you need to use the strftime() function in order to convert the ISO formatted string to seconds since epoch. julianday() would also work, as long as you convert the Python datetime as well and convert the result to seconds. Replace the 2 misbehaving comparisons with for example:

# Not sure if your times were supposed to be UTC or not
now_ts = now.replace(tzinfo=datetime.timezone.utc).timestamp()
delta_s = delta.total_seconds()

# Not quite pretty...
criterion = [
    (now_ts - func.strftime('%s', Record.date) < delta_s,
    (Record.date > now - delta),
    (delta_s > now_ts - func.strftime('%s', Record.date)),
    (now - delta < Record.date),
]
like image 108
Ilja Everilä Avatar answered Nov 15 '22 18:11

Ilja Everilä