Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: How to create a datetime object from Week and Year?

I have a dataframe that provides two integer columns with the Year and Week of the year:

import pandas as pd
import numpy as np
L1 = [43,44,51,2,5,12]
L2 = [2016,2016,2016,2017,2017,2017]
df = pd.DataFrame({"Week":L1,"Year":L2})

df
Out[72]: 
   Week  Year
0    43  2016
1    44  2016
2    51  2016
3     2  2017
4     5  2017
5    12  2017

I need to create a datetime-object from these two numbers.

I tried this, but it throws an error:

df["DT"] = df.apply(lambda x: np.datetime64(x.Year,'Y') + np.timedelta64(x.Week,'W'),axis=1)

Then I tried this, it works but gives the wrong result, that is it ignores the week completely:

df["S"] = df.Week.astype(str)+'-'+df.Year.astype(str)
df["DT"] = df["S"].apply(lambda x: pd.to_datetime(x,format='%W-%Y'))

df
Out[74]: 
   Week  Year        S         DT
0    43  2016  43-2016 2016-01-01
1    44  2016  44-2016 2016-01-01
2    51  2016  51-2016 2016-01-01
3     2  2017   2-2017 2017-01-01
4     5  2017   5-2017 2017-01-01
5    12  2017  12-2017 2017-01-01

I'm really getting lost between Python's datetime, Numpy's datetime64, and pandas Timestamp, can you tell me how it's done correctly?

I'm using Python 3, if that is relevant in any way.

EDIT:

Starting with Python 3.8 the problem is easily solved with a newly introduced method on datetime.date objects: https://docs.python.org/3/library/datetime.html#datetime.date.fromisocalendar

like image 342
Khris Avatar asked Aug 01 '17 11:08

Khris


People also ask

How do you make a panda datetime object?

To create a pandas datetime series in pandas, you can use the date_range function. It takes the parameter start as the starting date, periods as the number of values, and end as the ending date.

How do you create a date object in Python?

To create a date, we can use the datetime() class (constructor) of the datetime module. The datetime() class requires three parameters to create a date: year, month, day.


4 Answers

Try this:

In [19]: pd.to_datetime(df.Year.astype(str), format='%Y') + \
             pd.to_timedelta(df.Week.mul(7).astype(str) + ' days')
Out[19]:
0   2016-10-28
1   2016-11-04
2   2016-12-23
3   2017-01-15
4   2017-02-05
5   2017-03-26
dtype: datetime64[ns]

Initially I have timestamps in s

It's much easier to parse it from UNIX epoch timestamp:

df['Date'] = pd.to_datetime(df['UNIX_Time'], unit='s')

Timing for 10M rows DF:

Setup:

In [26]: df = pd.DataFrame(pd.date_range('1970-01-01', freq='1T', periods=10**7), columns=['date'])

In [27]: df.shape
Out[27]: (10000000, 1)

In [28]: df['unix_ts'] = df['date'].astype(np.int64)//10**9

In [30]: df
Out[30]:
                       date    unix_ts
0       1970-01-01 00:00:00          0
1       1970-01-01 00:01:00         60
2       1970-01-01 00:02:00        120
3       1970-01-01 00:03:00        180
4       1970-01-01 00:04:00        240
5       1970-01-01 00:05:00        300
6       1970-01-01 00:06:00        360
7       1970-01-01 00:07:00        420
8       1970-01-01 00:08:00        480
9       1970-01-01 00:09:00        540
...                     ...        ...
9999990 1989-01-05 10:30:00  599999400
9999991 1989-01-05 10:31:00  599999460
9999992 1989-01-05 10:32:00  599999520
9999993 1989-01-05 10:33:00  599999580
9999994 1989-01-05 10:34:00  599999640
9999995 1989-01-05 10:35:00  599999700
9999996 1989-01-05 10:36:00  599999760
9999997 1989-01-05 10:37:00  599999820
9999998 1989-01-05 10:38:00  599999880
9999999 1989-01-05 10:39:00  599999940

[10000000 rows x 2 columns]

Check:

In [31]: pd.to_datetime(df.unix_ts, unit='s')
Out[31]:
0         1970-01-01 00:00:00
1         1970-01-01 00:01:00
2         1970-01-01 00:02:00
3         1970-01-01 00:03:00
4         1970-01-01 00:04:00
5         1970-01-01 00:05:00
6         1970-01-01 00:06:00
7         1970-01-01 00:07:00
8         1970-01-01 00:08:00
9         1970-01-01 00:09:00
                  ...
9999990   1989-01-05 10:30:00
9999991   1989-01-05 10:31:00
9999992   1989-01-05 10:32:00
9999993   1989-01-05 10:33:00
9999994   1989-01-05 10:34:00
9999995   1989-01-05 10:35:00
9999996   1989-01-05 10:36:00
9999997   1989-01-05 10:37:00
9999998   1989-01-05 10:38:00
9999999   1989-01-05 10:39:00
Name: unix_ts, Length: 10000000, dtype: datetime64[ns]

Timing:

In [32]: %timeit pd.to_datetime(df.unix_ts, unit='s')
10 loops, best of 3: 156 ms per loop

Conclusion: I think 156 milliseconds for converting 10.000.000 rows is not that slow

like image 72
MaxU - stop WAR against UA Avatar answered Oct 19 '22 19:10

MaxU - stop WAR against UA


Like @Gianmario Spacagna mentioned for datetimes higher like 2018 use %V with %G:

L1 = [43,44,51,2,5,12,52,53,1,2,5,52]
L2 = [2016,2016,2016,2017,2017,2017,2018,2018,2019,2019,2019,2019]
df = pd.DataFrame({"Week":L1,"Year":L2})


df['new'] = pd.to_datetime(df.Week.astype(str)+
                           df.Year.astype(str).add('-1') ,format='%V%G-%u')
print (df)
    Week  Year        new
0     43  2016 2016-10-24
1     44  2016 2016-10-31
2     51  2016 2016-12-19
3      2  2017 2017-01-09
4      5  2017 2017-01-30
5     12  2017 2017-03-20
6     52  2018 2018-12-24
7     53  2018 2018-12-31
8      1  2019 2018-12-31
9      2  2019 2019-01-07
10     5  2019 2019-01-28
11    52  2019 2019-12-23
like image 31
jezrael Avatar answered Oct 19 '22 18:10

jezrael


There is something fishy going on with weeks starting from 2019. The ISO-8601 standard assigns the 31st December 2018 to the week 1 of year 2019. The other approaches based on:

pd.to_datetime(df.Week.astype(str)+
                  df.Year.astype(str).add('-2') ,format='%W%Y-%w')

will give shifted results starting from 2019.

In order to be compliant with the ISO-8601 standard you would have to do the following:

import pandas as pd
import datetime

L1 = [52,53,1,2,5,52]
L2 = [2018,2018,2019,2019,2019,2019]
df = pd.DataFrame({"Week":L1,"Year":L2})
df['ISO'] = df['Year'].astype(str) + '-W' + df['Week'].astype(str) + '-1'
df['DT'] = df['ISO'].map(lambda x: datetime.datetime.strptime(x, "%G-W%V-%u"))
print(df)

It prints:

   Week  Year         ISO         DT
0    52  2018  2018-W52-1 2018-12-24
1    53  2018  2018-W53-1 2018-12-31
2     1  2019   2019-W1-1 2018-12-31
3     2  2019   2019-W2-1 2019-01-07
4     5  2019   2019-W5-1 2019-01-28
5    52  2019  2019-W52-1 2019-12-23

The week 53 of 2018 is ignored and mapped to the week 1 of 2019.

Please verify yourself on https://www.epochconverter.com/weeks/2019.

like image 6
Gianmario Spacagna Avatar answered Oct 19 '22 17:10

Gianmario Spacagna


If you want to follow ISO Week Date

Weeks start with Monday. Each week's year is the Gregorian year in which the Thursday falls. The first week of the year, hence, always contains 4 January. ISO week year numbering therefore slightly deviates from the Gregorian for some days close to 1 January.

The following sample code, generates a sequence of 60 Dates, starting from 18Dec2016 Sun and adds the appropriate columns.

It adds:

  • A "Date"
  • Week Day of the "Date"
  • Finds the Week Starting Monday of that "Date"
  • Finds the Year of the Week Starting Monday of that "Date"
  • Adds a Week Number (ISO)
  • Gets the Starting Monday Date, from Year and Week Number

Sample Code Below:

# Generate Some Dates
dft1 = pd.DataFrame(pd.date_range('2016-12-18', freq='D', periods=60))
dft1.columns = ['e_FullDate']
dft1['e_FullDateWeekDay'] = dft1.e_FullDate.dt.day_name().str.slice(0,3)


#Add a Week Start Date (Monday)
dft1['e_week_start'] = dft1['e_FullDate'] - pd.to_timedelta(dft1['e_FullDate'].dt.weekday,
                                                      unit='D')
dft1['e_week_startWeekDay'] = dft1.e_week_start.dt.day_name().str.slice(0,3)

#Add a Week Start Year
dft1['e_week_start_yr'] = dft1.e_week_start.dt.year

#Add a Week Number of Week Start Monday
dft1['e_week_no'] = dft1['e_week_start'].dt.week

#Add a Week Start generate from Week Number and Year
dft1['e_week_start_from_week_no'] = pd.to_datetime(dft1.e_week_no.astype(str)+
                  dft1.e_week_start_yr.astype(str).add('-1') ,format='%W%Y-%w')
dft1['e_week_start_from_week_noWeekDay'] = dft1.e_week_start_from_week_no.dt.day_name().str.slice(0,3)


with pd.option_context('display.max_rows', 999, 'display.max_columns', 0, 'display.max_colwidth', 9999):
    display(dft1)

enter image description here

like image 1
ihightower Avatar answered Oct 19 '22 19:10

ihightower