Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

gnuplot: How to get correct week numbers?

Tags:

Originating from this question gnuplot why warning: Bad time format in string, there was the finding that the week numbers in gnuplot using the time specifiers %W and %U are wrong in some cases.

Apparently, there are different definitions of the week numbers. Furthermore, there are different definitions when a week starts, e.g. on Sunday or Monday. One definition for week numbers, which is commonly used (however, not in the US and a few other countries) is according to ISO 8601.

Code: (to illustrate wrong week numbers)

### wrong week numbering in gnuplot with %W and %U
reset session

StartDate = "24.12.2020"
myTimeFmt = "%d.%m.%Y"
SecondsPerDay = 3600*24

print "      date   %a  %w  %d   %j  %W  %U"
print "===================================="
do for [i=0:20] {
    t = strptime(myTimeFmt,StartDate) + i*SecondsPerDay
    myDate = strftime(myTimeFmt."  %a  %w  %d  %j  %W  %U", t)
    print sprintf("%s", myDate)
}
### end of code

gnuplot time specifiers:

%a abbreviated name of day of the week
%w day of the week, 0–6 (Sunday = 0)
%d day of the month, 01–31
%j day of the year, 1–366 
%W week of the year (week starts on Monday)
%U week of the year (week starts on Sunday)

Result:

      date   %a  %w  %d   %j  %W  %U
====================================
24.12.2020  Thu  04  24  359  52  52
25.12.2020  Fri  05  25  360  52  52
26.12.2020  Sat  06  26  361  52  52
27.12.2020  Sun  00  27  362  52  53
28.12.2020  Mon  01  28  363  53  53
29.12.2020  Tue  02  29  364  53  53
30.12.2020  Wed  03  30  365  53  53
31.12.2020  Thu  04  31  366  53  53
01.01.2021  Fri  05  01  001  01  01   ???
02.01.2021  Sat  06  02  002  01  01   ???
03.01.2021  Sun  00  03  003  00  01   ???
04.01.2021  Mon  01  04  004  01  01
05.01.2021  Tue  02  05  005  01  01
06.01.2021  Wed  03  06  006  01  01
07.01.2021  Thu  04  07  007  01  01
08.01.2021  Fri  05  08  008  01  01
09.01.2021  Sat  06  09  009  01  01
10.01.2021  Sun  00  10  010  01  02
11.01.2021  Mon  01  11  011  02  02
12.01.2021  Tue  02  12  012  02  02
13.01.2021  Wed  03  13  013  02  02

Question: Is there a workaround to fix this?

like image 213
theozh Avatar asked Jan 05 '21 11:01

theozh


2 Answers

Based on the description here: https://en.wikipedia.org/wiki/ISO_week_date, I guess the essence of the ISO 8601 definition is:

  1. a week starts on Monday
  2. week 01 is the week with the first Thursday of the year
  3. a week belongs to the year in which the majority of its days is in
  4. years starting or ending on Thurdays have 53 weeks, others have 52 weeks

Code:

### correct week number according to ISO 8601
reset session

dow(t)      = int(tm_wday(t)) ? tm_wday(t) : 7                               # day of week 1=Mon, ..., 7=Sun
week(t)     = int((11 + tm_yday(t) - dow(t))/7)                              # "raw"week of year
wday(d,m,y) = tm_wday(strptime("%d.%m.%Y",sprintf("%02d.%02d.%04d",d,m,y)))  # week day of certain date
wpy(y)      = wday(1,1,y)==4 || wday(31,12,y)==4 ? 53 : 52                   # weeks per year
woy(t)      = week(t) < 1 ? wpy(tm_year(t)-1) : \
              week(t) > wpy(tm_year(t)) ? 1 : week(t)                        # week of year
yow(t)      = int(week(t) < 1 ? tm_year(t)-1 : week(t) > wpy(tm_year(t)) ? \
              tm_year(t)+1 : tm_year(t))                                     # year of week (could be previous, current or next)

StartDate = "24.12.2020"
myTimeFmt = "%d.%m.%Y"
SecondsPerDay = 3600*24

print "      date   %a DoW  %d   %j   YoW WoY"
print "======================================"
do for [i=0:20] {
    t = strptime(myTimeFmt,StartDate) + i*SecondsPerDay
    myDate = strftime(myTimeFmt."  %a", t)
    myDate2 = strftime("%d  %j", t)
    print sprintf("%s  %02d  %s  %04d-W%02d", myDate, dow(t), myDate2, yow(t), woy(t))
}
### end of code

Result:

      date   %a DoW  %d   %j   YoW WoY
======================================
24.12.2020  Thu  04  24  359  2020-W52
25.12.2020  Fri  05  25  360  2020-W52
26.12.2020  Sat  06  26  361  2020-W52
27.12.2020  Sun  07  27  362  2020-W52
28.12.2020  Mon  01  28  363  2020-W53
29.12.2020  Tue  02  29  364  2020-W53
30.12.2020  Wed  03  30  365  2020-W53
31.12.2020  Thu  04  31  366  2020-W53
01.01.2021  Fri  05  01  001  2020-W53
02.01.2021  Sat  06  02  002  2020-W53
03.01.2021  Sun  07  03  003  2020-W53
04.01.2021  Mon  01  04  004  2021-W01
05.01.2021  Tue  02  05  005  2021-W01
06.01.2021  Wed  03  06  006  2021-W01
07.01.2021  Thu  04  07  007  2021-W01
08.01.2021  Fri  05  08  008  2021-W01
09.01.2021  Sat  06  09  009  2021-W01
10.01.2021  Sun  07  10  010  2021-W01
11.01.2021  Mon  01  11  011  2021-W02
12.01.2021  Tue  02  12  012  2021-W02
13.01.2021  Wed  03  13  013  2021-W02

In order to use the week numbers, e.g. as time axis labels it would be ideal to have this implemented for %W. Accidentially, there was a recent bug report on SourceForge. So, I assume it will be fixed pretty soon in one of the next versions.

like image 85
theozh Avatar answered Sep 30 '22 18:09

theozh


Given the ongoing pandemic and the consequent interest in plotting epidemiological data from all sources, it seemed expedient to clean up and extend gnuplot's support for week-date formats. The "New Features" section of the gnuplot documentation now lists:

• Time specifier format %W has been brought into accord with the ISO 8601 week date standard. 
• Time specifier format %U has been brought into accord with the CDC/MMWR week date standard. 
• New function tm week(time, std) returns ISO or CDC standard week of year. 
• New function weekdate iso(year, week, day) converts ISO standard week date to calendar time. 
• New function weekdate cdc(year, week, day) converts CDC standard week date to calendar time.

Here is an example (from the online demo set) of converting data given in ISO 8601 week-date format into standard calendar dates for plotting along a gnuplot time axis.

#                   Epidemiological data
#
# Plot from data file that encodes date as an ISO 8601 "week date".
# Example:  week date 2004-W01-1 is calendar date 29 December 2003
# The data is from the European Centre for Disease Prevention and Control
# https://www.ecdc.europa.eu/

# The ECDC data file uses fields containing week date as "YYYY-WW".
# First we define a function that extracts the integer year and week
# from this string and converts it to standard time representation.

calendar(date) = weekdate_iso( int(date[1:4]), int(date[6:7]) )

set datafile separator comma
set style data lines
set key Left left reverse box samplen 2 width 2
set grid x lt 1 lw .75 lc "gray"
set tics nomirror
set border 3
set xtics time format "%b\n%Y"
set ytics format " %4.0f"

data1 = '< grep "Denmark.*cases" ECDC-weekly-national-COVID.csv'
data2 = '< grep "Sweden.*cases" ECDC-weekly-national-COVID.csv'
data3 = '< grep "Norway.*cases" ECDC-weekly-national-COVID.csv'
data4 = '< grep "Finland.*cases" ECDC-weekly-national-COVID.csv'
data5 = '< grep "Iceland.*cases" ECDC-weekly-national-COVID.csv'

set title "weekly COVID-19 cases per 100,000 people" font "/Bold,15"

plot data1 using (calendar(strcol(7))) : (1.e5*$6/$4) lw 2 title "Denmark", \
     data2 using (calendar(strcol(7))) : (1.e5*$6/$4) lw 2 title "Sweden", \
     data3 using (calendar(strcol(7))) : (1.e5*$6/$4) lw 2 title "Norway", \
     data4 using (calendar(strcol(7))) : (1.e5*$6/$4) lw 2 title "Finland", \
     data5 using (calendar(strcol(7))) : (1.e5*$6/$4) lw 2 lt 6 title "Iceland"

enter image description here

like image 39
Ethan Avatar answered Sep 30 '22 17:09

Ethan