Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get the format in dateutil.parse

Is there a way to get the "format" after parsing a date in dateutil. For example something like:

>>> x = parse("2014-01-01 00:12:12") datetime.datetime(2014, 1, 1, 0, 12, 12)  x.get_original_string_format() YYYY-MM-DD HH:MM:SS # %Y-%m-%d %H:%M:%S  # Or, passing the date-string directly get_original_string_format("2014-01-01 00:12:12") YYYY-MM-DD HH:MM:SS # %Y-%m-%d %H:%M:%S 

Update: I'd like to add a bounty to this question to see if someone could add an answer that would do the equivalent on getting the string-format of a common date-string passed. It can use dateutil if you want, but it doesn't have to. Hopefully we'll get some creative solutions here.

like image 467
David542 Avatar asked Dec 22 '18 01:12

David542


People also ask

What is dateutil parser?

This module offers a generic date/time string parser which is able to parse most known formats to represent a date and/or time. This module attempts to be forgiving with regards to unlikely input formats, returning a datetime object even for dates which are ambiguous.

Is dateutil built into Python?

dateutil is a third party module. It has recently been ported to Python 3 with dateutil 2.0, and the parser functions was ported as well. So the replacement is dateutil.

How do you parse a date in python?

Python has a built-in method to parse dates, strptime . This example takes the string “2020–01–01 14:00” and parses it to a datetime object. The documentation for strptime provides a great overview of all format-string options.


2 Answers

Is there a way to get the "format" after parsing a date in dateutil?

Not possible with dateutil. The problem is that dateutil never has the format as an intermediate result any time during the parsing as it detects separate components of the datetime separately - take a look at this not quite easy to read source code.

like image 175
alecxe Avatar answered Oct 19 '22 22:10

alecxe


I don't know of a way that you can return the parsed format from dateutil (or any other python timestamp parser that I know of).

Implementing your own timestamp parsing function that returns the format along with the datetime object is fairly trivial using datetime.strptime() but doing it efficiently against a broadly useful list of possible timestamp formats is not.

The following example utilizes a list of just over 50 formats adapted from one of the top hits from a quick search for timestamp formats. It does not even scratch the surface of the wide variety of formats parsed by dateutil. It tests each format in sequence until it finds a match or exhausts all formats in the list (likely much less efficient than the dateutil approach of locating the various datetime parts independently as noted in the answer from @alecxe).

In addition, I have included some example timestamp formats that include time zone names (instead of offsets). If you run the example function below against those particular datetime strings, you may find that it returns "Unable to parse format" even though I have included matching formats using the %Z directive. Some explanation for the challenges with using %Z to handle time zone names can be found in issue 22377 at bugs.python.org (just to highlight another non-trivial aspect of implementing your own datetime parsing function).

With all of those caveats, if you are dealing with a manageable set of potential formats, implementing something simple like the below may get you what you need.

Example function that attempts to match a datetime string against a list of formats and return the datetime object along with the matched format:

from datetime import datetime  def parse_timestamp(datestring, formats):     for f in formats:         try:             d = datetime.strptime(datestring, f)         except:             continue         return (d, f)     return (datestring, 'Unable to parse format') 

Example formats and datetime strings adapted from Timestamps, Time Zones, Time Ranges, and Date Formats:

formats = ['%Y-%m-%dT%H:%M:%S*%f%z','%Y %b %d %H:%M:%S.%f %Z','%b %d %H:%M:%S %z %Y','%d/%b/%Y:%H:%M:%S %z','%b %d, %Y %I:%M:%S %p','%b %d %Y %H:%M:%S','%b %d %H:%M:%S %Y','%b %d %H:%M:%S %z','%b %d %H:%M:%S','%Y-%m-%dT%H:%M:%S%z','%Y-%m-%dT%H:%M:%S.%f%z','%Y-%m-%d %H:%M:%S %z','%Y-%m-%d %H:%M:%S%z','%Y-%m-%d %H:%M:%S,%f','%Y/%m/%d*%H:%M:%S','%Y %b %d %H:%M:%S.%f*%Z','%Y %b %d %H:%M:%S.%f','%Y-%m-%d %H:%M:%S,%f%z','%Y-%m-%d %H:%M:%S.%f','%Y-%m-%d %H:%M:%S.%f%z','%Y-%m-%dT%H:%M:%S.%f','%Y-%m-%dT%H:%M:%S','%Y-%m-%dT%H:%M:%S%Z','%Y-%m-%dT%H:%M:%S.%f','%Y-%m-%dT%H:%M:%S','%Y-%m-%d*%H:%M:%S:%f','%Y-%m-%d*%H:%M:%S','%y-%m-%d %H:%M:%S,%f %z','%y-%m-%d %H:%M:%S,%f','%y-%m-%d %H:%M:%S','%y/%m/%d %H:%M:%S','%y%m%d %H:%M:%S','%Y%m%d %H:%M:%S.%f','%m/%d/%y*%H:%M:%S','%m/%d/%Y*%H:%M:%S','%m/%d/%Y*%H:%M:%S*%f','%m/%d/%y %H:%M:%S %z','%m/%d/%Y %H:%M:%S %z','%H:%M:%S','%H:%M:%S.%f','%H:%M:%S,%f','%d/%b %H:%M:%S,%f','%d/%b/%Y:%H:%M:%S','%d/%b/%Y %H:%M:%S','%d-%b-%Y %H:%M:%S','%d-%b-%Y %H:%M:%S.%f','%d %b %Y %H:%M:%S','%d %b %Y %H:%M:%S*%f','%m%d_%H:%M:%S','%m%d_%H:%M:%S.%f','%m/%d/%Y %I:%M:%S %p:%f','%m/%d/%Y %H:%M:%S %p']  datestrings = ['2018-08-20T13:20:10*633+0000','2017 Mar 03 05:12:41.211 PDT','Jan 21 18:20:11 +0000 2017','19/Apr/2017:06:36:15 -0700','Dec 2, 2017 2:39:58 AM','Jun 09 2018 15:28:14','Apr 20 00:00:35 2010','Sep 28 19:00:00 +0000','Mar 16 08:12:04','2017-10-14T22:11:20+0000','2017-07-01T14:59:55.711+0000','2017-08-19 12:17:55 -0400','2017-08-19 12:17:55-0400','2017-06-26 02:31:29,573','2017/04/12*19:37:50','2018 Apr 13 22:08:13.211*PDT','2017 Mar 10 01:44:20.392','2017-03-10 14:30:12,655+0000','2018-02-27 15:35:20.311','2017-03-12 13:11:34.222-0700','2017-07-22T16:28:55.444','2017-09-08T03:13:10','2017-03-12T17:56:22-0700','2017-11-22T10:10:15.455','2017-02-11T18:31:44','2017-10-30*02:47:33:899','2017-07-04*13:23:55','11-02-11 16:47:35,985 +0000','10-06-26 02:31:29,573','10-04-19 12:00:17','06/01/22 04:11:05','150423 11:42:35','20150423 11:42:35.173','08/10/11*13:33:56','11/22/2017*05:13:11','05/09/2017*08:22:14*612','04/23/17 04:34:22 +0000','10/03/2017 07:29:46 -0700','11:42:35','11:42:35.173','11:42:35,173','23/Apr 11:42:35,173','23/Apr/2017:11:42:35','23/Apr/2017 11:42:35','23-Apr-2017 11:42:35','23-Apr-2017 11:42:35.883','23 Apr 2017 11:42:35','23 Apr 2017 10:32:35*311','0423_11:42:35','0423_11:42:35.883','8/5/2011 3:31:18 AM:234','9/28/2011 2:23:15 PM'] 

Example usage:

print(parse_timestamp(datestrings[0], formats)) # OUTPUT # (datetime.datetime(2018, 8, 20, 13, 20, 10, 633000, tzinfo=datetime.timezone.utc), '%Y-%m-%dT%H:%M:%S*%f%z') 
like image 38
benvc Avatar answered Oct 19 '22 22:10

benvc