Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove unconverted data from a Python datetime object

I have a database of mostly correct datetimes but a few are broke like so: Sat Dec 22 12:34:08 PST 20102015

Without the invalid year, this was working for me:

end_date = soup('tr')[4].contents[1].renderContents() end_date = time.strptime(end_date,"%a %b %d %H:%M:%S %Z %Y") end_date = datetime.fromtimestamp(time.mktime(end_date)) 

But once I hit an object with a invalid year I get ValueError: unconverted data remains: 2, which is great but im not sure how best to strip the bad characters out of the year. They range from 2 to 6 unconverted characters.

Any pointers? I would just slice end_date but im hoping there is a datetime-safe strategy.

like image 467
Ben Keating Avatar asked Feb 18 '11 18:02

Ben Keating


People also ask

How do I change the format of a datetime object in Python?

Use datetime. strftime(format) to convert a datetime object into a string as per the corresponding format . The format codes are standard directives for mentioning in which format you want to represent datetime. For example, the %d-%m-%Y %H:%M:%S codes convert date to dd-mm-yyyy hh:mm:ss format.

What is the difference between Strptime and Strftime?

strptime is short for "parse time" where strftime is for "formatting time". That is, strptime is the opposite of strftime though they use, conveniently, the same formatting specification.

How do I remove the time from a date timestamp in python?

To remove the time from a datetime object in Python, convert the datetime to a date using date(). You can also use strftime() to create a string from a datetime object without the time. When working in Python, many times we need to create variables which represent dates and times.


2 Answers

Unless you want to rewrite strptime (a very bad idea), the only real option you have is to slice end_date and chop off the extra characters at the end, assuming that this will give you the correct result you intend.

For example, you can catch the ValueError, slice, and try again:

def parse_prefix(line, fmt):     try:         t = time.strptime(line, fmt)     except ValueError as v:         if len(v.args) > 0 and v.args[0].startswith('unconverted data remains: '):             line = line[:-(len(v.args[0]) - 26)]             t = time.strptime(line, fmt)         else:             raise     return t 

For example:

parse_prefix(     '2015-10-15 11:33:20.738 45162 INFO core.api.wsgi yadda yadda.',     '%Y-%m-%d %H:%M:%S' ) # -> time.struct_time(tm_year=2015, tm_mon=10, tm_mday=15, tm_hour=11, tm_min=33, ... 
like image 174
Adam Rosenfield Avatar answered Sep 27 '22 15:09

Adam Rosenfield


Yeah, I'd just chop off the extra numbers. Assuming they are always appended to the datestring, then something like this would work:

end_date = end_date.split(" ") end_date[-1] = end_date[-1][:4] end_date = " ".join(end_date) 

I was going to try to get the number of excess digits from the exception, but on my installed versions of Python (2.6.6 and 3.1.2) that information isn't actually there; it just says that the data does not match the format. Of course, you could just continue lopping off digits one at a time and re-parsing until you don't get an exception.

You could also write a regex that will match only valid dates, including the right number of digits in the year, but that seems like overkill.

like image 40
kindall Avatar answered Sep 27 '22 17:09

kindall