Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to efficiently group by date with a timezone specified?

I'm currently running an aggregate query, summing amounts sold (say) on a given date.

select convert(date, datetimesold), sum(amountsold) from tblSold 
group by convert(date, datetimesold)

where datetimesold is a datetime value.

The convert(date,...) gets rid of the time value, so the group by can group by whole day.

Already this isn't efficient, as it requires a table scan for the convert per row - a better approach would be to add a 'datesold' column containing just the date value, indexed, and include this value on insert each time. But this would lose precision on that column, and that is important, because...

datetimesold is a UTC datetime. So my question is: say I wanted to group by day, but on Eastern US time. I would have to add an offset in hours to datetimesold before doing the convert in the group by - group by convert(date, dateadd(hours, -5, datetimesold)) - but even then this wouldn't be always accurate due to daylight savings - EDT -4 hours, EST -5 hours.

Do I have any efficient options to do this in SQL? Are there any timezone-aware functions that I can use here?

EDIT: To further clarify, I'm operating on Azure SQL databases.

like image 220
oflahero Avatar asked Oct 30 '22 05:10

oflahero


1 Answers

You already noticed that it is difficult to convert from UTC to local time zone correctly. In fact, it is very difficult, because rules for daylight savings change. You need to maintain a historical database of timezones to do it properly.

I store two timestamps - in UTC and in local time zone. In some reports we need UTC, in some local.

Usually it is easy to convert between UTC and local time zone when the row is inserted and the OS of the client computer, which generates the data is in correct local time zone. At that moment OS knows both local and UTC time. But, if you have historical data from previous years it becomes much more difficult to perform such conversion.

SQL Server 2016 promises to add somewhat better support for time zones, see: AT TIME ZONE.


As for your concern about table scan - you'll always have to scan the whole table to calculate the SUM, so extra CONVERT to date doesn't really matter.

On the other hand,

If you have a separate column that stores just date, not datetime, the query will be a bit more efficient, because date takes less bytes than datetime, so less bytes to read from disk.

If you add an index on (datesold, amountsold), then the GROUP BY would not have to do an extra sort, which also makes a query more efficient.


So, in the current version of SQL Server I'd add an indexed date column which would contain a date in the time zone that you need for your reports. If there is a need for reports in UTC and Eastern US time zones, I'd add two separate date columns.

like image 186
Vladimir Baranov Avatar answered Nov 15 '22 05:11

Vladimir Baranov