Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Message "Exception ignored" when dealing pandas.datetime type

I have a xlsx file with a column containing Dates in the format: "01.01.1900 09:01:25". The file is password protected so I convert it to a dataframe by means of win32com.client library.

Here is the code:

import pandas as pd
import win32com.client

xlApp = win32com.client.Dispatch("Excel.Application")
xlApp.DisplayAlerts = False
xlwb = xlApp.Workbooks.Open(File, False, True, None, " ") #Open Workbook password " "
xlws = xlwb.Sheets("Sheet 1") #Open Sheet 1        

#Get table dimensions 
LastRow = xlws.Range("A1").CurrentRegion.Rows.Count
LastColumn = xlws.Range("A1").CurrentRegion.Columns.Count
header=list((xlws.Range(xlws.Cells(1, 1), xlws.Cells(1, LastColumn)).Value)[0])
content = list(xlws.Range(xlws.Cells(2, 1), xlws.Cells(LastRow, LastColumn)).Value)
#Get the dataframe
df=pd.DataFrame(data=content, columns=header)
print (df)

I checked that once imported dtype as been automatically and correctly assigned to datetime64 for that column. The issue is that any time I try to do whatever with any value of that column (just print it or compare it) I get a meesage saying:

  File "pandas\_libs\tslibs\timezones.pyx", line 227, in pandas._libs.tslibs.timezones.get_dst_info

AttributeError: 'NoneType' object has no attribute 'total_seconds'

Exception ignored in: 'pandas._libs.tslib._localize_tso'
Traceback (most recent call last):
  File "pandas\_libs\tslibs\timezones.pyx", line 227, in pandas._libs.tslibs.timezones.get_dst_info
AttributeError: 'NoneType' object has no attribute 'total_seconds'
Traceback (most recent call last):

Nonetheless the code works perfectly, but the warning message is annoying me.

Is there anything I can do with the datatype to avoid that warning?

like image 535
DrJuzo Avatar asked Aug 13 '18 17:08

DrJuzo


People also ask

Does pandas support datetime?

pandas supports converting integer or float epoch times to Timestamp and DatetimeIndex . The default unit is nanoseconds, since that is how Timestamp objects are stored internally.

What is pandas datetime format?

The date-time default format is “YYYY-MM-DD”. Hence, December 8, 2020, in the date format will be presented as “2020-12-08”. The datetime format can be changed and by changing we mean changing the sequence and style of the format.

What is a datetime object in pandas?

datetime object. Timestamp is the pandas equivalent of python's Datetime and is interchangeable with it in most cases. It's the type used for the entries that make up a DatetimeIndex, and other timeseries oriented data structures in pandas.

What is DT datetime pandas?

dt. date attribute to return the date property of the underlying data of the given Series object.


1 Answers

Opening the excel in this way, the content variable is a list of tuples.

Having a look on those tuples there is a TimeZoneInfo that localizes all the dates in a kind of time zone, in my case "GMT Standard Time".

So once converted to a dataframe, when doing df.dtypes the result is not only "datetime64" but "datetime64 (UTC+0:00) Dublin, Edimburg, ..."

This time zone setting only happens when opening the excel file through win32com.client. If you removed the password, you can open it with pandas.read_excel and discover that there is no timezone set for those datetimes and the mentioned warning does not appear.

Don't know exactly the reason it happens, but I have a solution for the original example. The warning dissapears setting a timezone recognized by tz database as "UTC" or simply None. Something like:

df["col_name"]=df["col_name"].dt.tz_convert(None)
like image 108
DrJuzo Avatar answered Nov 05 '22 01:11

DrJuzo