I have a xlsx file with a column containing Dates in the format: "01.01.1900 09:01:25". The file is password protected so I convert it to a dataframe by means of win32com.client library.
Here is the code:
import pandas as pd
import win32com.client
xlApp = win32com.client.Dispatch("Excel.Application")
xlApp.DisplayAlerts = False
xlwb = xlApp.Workbooks.Open(File, False, True, None, " ") #Open Workbook password " "
xlws = xlwb.Sheets("Sheet 1") #Open Sheet 1
#Get table dimensions
LastRow = xlws.Range("A1").CurrentRegion.Rows.Count
LastColumn = xlws.Range("A1").CurrentRegion.Columns.Count
header=list((xlws.Range(xlws.Cells(1, 1), xlws.Cells(1, LastColumn)).Value)[0])
content = list(xlws.Range(xlws.Cells(2, 1), xlws.Cells(LastRow, LastColumn)).Value)
#Get the dataframe
df=pd.DataFrame(data=content, columns=header)
print (df)
I checked that once imported dtype as been automatically and correctly assigned to datetime64 for that column. The issue is that any time I try to do whatever with any value of that column (just print it or compare it) I get a meesage saying:
File "pandas\_libs\tslibs\timezones.pyx", line 227, in pandas._libs.tslibs.timezones.get_dst_info
AttributeError: 'NoneType' object has no attribute 'total_seconds'
Exception ignored in: 'pandas._libs.tslib._localize_tso'
Traceback (most recent call last):
File "pandas\_libs\tslibs\timezones.pyx", line 227, in pandas._libs.tslibs.timezones.get_dst_info
AttributeError: 'NoneType' object has no attribute 'total_seconds'
Traceback (most recent call last):
Nonetheless the code works perfectly, but the warning message is annoying me.
Is there anything I can do with the datatype to avoid that warning?
pandas supports converting integer or float epoch times to Timestamp and DatetimeIndex . The default unit is nanoseconds, since that is how Timestamp objects are stored internally.
The date-time default format is “YYYY-MM-DD”. Hence, December 8, 2020, in the date format will be presented as “2020-12-08”. The datetime format can be changed and by changing we mean changing the sequence and style of the format.
datetime object. Timestamp is the pandas equivalent of python's Datetime and is interchangeable with it in most cases. It's the type used for the entries that make up a DatetimeIndex, and other timeseries oriented data structures in pandas.
dt. date attribute to return the date property of the underlying data of the given Series object.
Opening the excel in this way, the content
variable is a list of tuples.
Having a look on those tuples there is a TimeZoneInfo that localizes all the dates in a kind of time zone, in my case "GMT Standard Time".
So once converted to a dataframe, when doing df.dtypes
the result is not only "datetime64" but "datetime64 (UTC+0:00) Dublin, Edimburg, ..."
This time zone setting only happens when opening the excel file through win32com.client
. If you removed the password, you can open it with pandas.read_excel
and discover that there is no timezone set for those datetimes and the mentioned warning does not appear.
Don't know exactly the reason it happens, but I have a solution for the original example. The warning dissapears setting a timezone recognized by tz database as "UTC"
or simply None
. Something like:
df["col_name"]=df["col_name"].dt.tz_convert(None)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With