Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lua ISO 8601 datetime parsing pattern

I'm trying to parse a full ISO8601 datetime from JSON data in Lua. I'm having trouble with the match pattern.

So far, this is what I have:

-- Example datetime string 2011-10-25T00:29:55.503-04:00
local datetime = "2011-10-25T00:29:55.503-04:00"
local pattern = "(%d+)%-(%d+)%-(%d+)T(%d+):(%d+):(%d+)%.(%d+)"
local xyear, xmonth, xday, xhour, xminute, 
        xseconds, xmillies, xoffset = datetime:match(pattern)

local convertedTimestamp = os.time({year = xyear, month = xmonth, 
        day = xday, hour = xhour, min = xminute, sec = xseconds})

I'm stuck at how to deal with the timezone on the pattern because there is no logical or that will handle the - or + or none. Although I know lua doesn't support the timezone in the os.time function, at least I would know how it needed to be adjusted.

I've considered stripping off everything after the "." (milliseconds and timezone), but then i really wouldn't have a valid datetime. Milliseconds is not all that important and i wouldn't mind losing it, but the timezone changes things.

Note: Somebody may have some much better code for doing this and I'm not married to it, I just need to get something useful out of the datetime string :)

like image 257
Brill Pappin Avatar asked Oct 27 '11 03:10

Brill Pappin


3 Answers

The full ISO 8601 format can't be done with a single pattern match. There is too much variation.

Some examples from the wikipedia page:

  • There is a "compressed" format that doesn't separate numbers: YYYYMMDD vs YYYY-MM-DD
  • The day can be omited: YYYY-MM-DD and YYYY-MM are both valid dates
  • The ordinal date is also valid: YYYY-DDD, where DDD is the day of the year (1-365/6)
  • When representing the time, the minutes and seconds can be ommited: hh:mm:ss, hh:mm and hh are all valid times
  • Moreover, time also has a compressed version: hhmmss, hhmm
  • And on top of that, time accepts fractions, using both the dot or the comma to denote fractions of the lower time element in the time section. 14:30,5, 1430,5, 14:30.5, or 1430.5 all represent 14 hours, 30 seconds and a half.
  • Finally, the timezone section is optional. When present, it can be either the letter Z, ±hh:mm, ±hh or ±hhmm.

So, there are lots of possible exceptions to take into account, if you are going to parse according to the full spec. In that case, your initial code might look like this:

function parseDateTime(str)
  local Y,M,D = parseDate(str)
  local h,m,s = parseTime(str)
  local oh,om = parseOffset(str)
  return os.time({year=Y, month=M, day=D, hour=(h+oh), min=(m+om), sec=s})
end

And then you would have to create parseDate, parseTime and parseOffset. The later should return the time offsets from UTC, while the first two would have to take into account things like compressed formats, time fractions, comma or dot separators, and the like.

parseDate will likely use the "^" character at the beginning of its pattern matches, since the date has to be at the beginning of the string. parseTime's patterns will likely start with "T". And parseOffset's will end with "$", since the time offsets, when they exist, are at the end.

A "full ISO" parseOffset function might look similar to this:

function parseOffset(str)
  if str:sub(-1)=="Z" then return 0,0 end -- ends with Z, Zulu time

  -- matches ±hh:mm, ±hhmm or ±hh; else returns nils 
  local sign, oh, om = str:match("([-+])(%d%d):?(%d?%d?)$") 
  sign, oh, om = sign or "+", oh or "00", om or "00"

  return tonumber(sign .. oh), tonumber(sign .. om)
end

By the way, I'm assuming that your computer is working in UTC time. If that's not the case, you will have to include an additional offset on your hours/minutes to account for that.

function parseDateTime(str)
  local Y,M,D =   parseDate(str)
  local h,m,s =   parseTime(str)
  local oh,om =   parseOffset(str)
  local loh,lom = getLocalUTCOffset()
  return os.time({year=Y, month=M, day=D, hour=(h+oh-loh), min=(m+om-lom), sec=s})
end

To get your local offset you might want to look at http://lua-users.org/wiki/TimeZone .

I hope this helps. Regards!

like image 196
kikito Avatar answered Nov 03 '22 10:11

kikito


There is also the luadate package, which supports iso8601. (You probably want the patched version)

like image 3
Brian M. Hunt Avatar answered Nov 03 '22 10:11

Brian M. Hunt


Here is a simple parseDate function for ISO dates. Note that I'm using "now" as a fallback. This may or may not work for you. YMMV 😉.

--[[
    Parse date given in any of supported forms.

    Note! For unrecognised format will return now.

    @param str ISO date. Formats:
        Y-m-d
        Y-m -- this will assume January
        Y -- this will assume 1st January
]]
function parseDate(str)
    local y, m, d = str:match("(%d%d%d%d)-?(%d?%d?)-?(%d?%d?)$")
    -- fallback to now
    if y == nil then
        return os.time()
    end
    -- defaults
    if m == '' then
        m = 1
    end
    if d == '' then
        d = 1
    end
    -- create time
    return os.time{year=y, month=m, day=d, hour=0}
end
--[[
--Tests:
print(  os.date( "%Y-%m-%d", parseDate("2019-12-28") )  )
print(  os.date( "%Y-%m-%d", parseDate("2019-12") )  )
print(  os.date( "%Y-%m-%d", parseDate("2019") )  )
]]
like image 1
Nux Avatar answered Nov 03 '22 11:11

Nux