Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas - convert float to proper datetime or time object

I have an observational data set which contain weather information. Each column contain specific field in which date and time are in two separate column. The time column contain hourly time like 0000, 0600 .. up to 2300. What I am trying to do is to filter the data set based on certain time frame, for example between 0000 UTC to 0600 UTC. When I try to read the data file in pandas data frame, by default the time column is read in float. When I try to convert it in to datatime object, it produces a format which I am unable to convert. Code example is given below:

import pandas as pd
import datetime as dt 
df = pd.read_excel("test.xlsx") 
df.head()

which produces the following result:

       tdate   itime moonph  speed   ...          qnh  windir maxtemp mintemp
0  01-Jan-17  1000.0    NM7      5   ...    $1,011.60    60.0  $32.60  $22.80
1  01-Jan-17  1000.0    NM7      2   ...    $1,015.40   999.0  $32.60  $22.80
2  01-Jan-17  1030.0    NM7      4   ...    $1,015.10    60.0  $32.60  $22.80
3  01-Jan-17  1100.0    NM7      3   ...    $1,014.80   999.0  $32.60  $22.80
4  01-Jan-17  1130.0    NM7      5   ...    $1,014.60   270.0  $32.60  $22.80

Then I extracted the time column with following line:

df["time"] = df.itime

df["time"]

0       1000.0
1       1000.0
2       1030.0
3       1100.0
4       1130.0
5       1200.0
6       1230.0
7       1300.0
8       1330.0
.
.
3261    2130.0
3262    2130.0
3263     600.0
3264     630.0
3265     730.0
3266     800.0
3267     830.0
3268    1900.0
3269    1930.0
3270    2000.0

Name: time, Length: 3279, dtype: float64

Then I tried to convert the time column to datetime object:

df["time"] = pd.to_datetime(df.itime)

which produced the following result:

df["time"]

0      1970-01-01 00:00:00.000001000
1      1970-01-01 00:00:00.000001000
2      1970-01-01 00:00:00.000001030
3      1970-01-01 00:00:00.000001100

It appears that it has successfully converted the data to datetime object. However, it added the hour time to ms which is difficult for me to do filtering.

The final data format I would like to get is either:

1970-01-01 06:00:00

or

06:00

Any help is appreciated.

like image 933
sundar_ima Avatar asked Jan 22 '19 17:01

sundar_ima


2 Answers

When you read the excel file specify the dtype of col itime as a str:

df = pd.read_excel("test.xlsx", dtype={'itime':str})

then you will have a time column of strings looking like:

df = pd.DataFrame({'itime':['2300', '0100', '0500', '1000']})

Then specify the format and convert to time:

df['Time'] = pd.to_datetime(df['itime'], format='%H%M').dt.time

    itime   Time
0   2300    23:00:00
1   0100    01:00:00
2   0500    05:00:00
3   1000    10:00:00
like image 74
It_is_Chris Avatar answered Oct 26 '22 09:10

It_is_Chris


Just addon to Chris answer, if you are unable to convert because there is no zero in the front, apply the following to the dataframe.

df['itime'] = df['itime'].apply(lambda x: x.zfill(4))

So basically is that because the original format does not have even leading digit (4 digit). Example: 945 instead of 0945.

like image 40
Wing Shum Avatar answered Oct 26 '22 09:10

Wing Shum