Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert columns into one datetime column in pandas?

I have a dataframe where the first 3 columns are 'MONTH', 'DAY', 'YEAR'

In each column there is an integer. Is there a Pythonic way to convert all three columns into datetimes while there are in the dataframe?

From:

M    D    Y    Apples   Oranges 5    6  1990      12        3 5    7  1990      14        4 5    8  1990      15       34 5    9  1990      23       21 

into:

Datetimes    Apples   Oranges 1990-6-5        12        3 1990-7-5        14        4 1990-8-5        15       34 1990-9-5        23       21 
like image 271
user1367204 Avatar asked Oct 13 '13 22:10

user1367204


People also ask

How do I convert multiple columns to datetime in Python?

You can use the DataFrame. apply() and pd. to_datetime() function to convert multiple columns to DataTime. apply() function applies a function to each and every row and column of the DataFrame.

How do I convert multiple columns to single column in pandas?

Step #1: Load numpy and Pandas. Step #2: Create random data and use them to create a pandas dataframe. Step #3: Convert multiple lists into a single data frame, by creating a dictionary for each list with a name. Step #4: Then use Pandas dataframe into dict.

How do I merge date columns in pandas?

Pandas Combine() Function combine() function which allows us to take a date and time string values and combine them to a single Pandas timestamp object.


2 Answers

In 0.13 (coming very soon), this is heavily optimized and quite fast (but still pretty fast in 0.12); both orders of magnitude faster than looping

In [3]: df Out[3]:     M  D     Y  Apples  Oranges 0  5  6  1990      12        3 1  5  7  1990      14        4 2  5  8  1990      15       34 3  5  9  1990      23       21  In [4]: df.dtypes Out[4]:  M          int64 D          int64 Y          int64 Apples     int64 Oranges    int64 dtype: object  # in 0.12, use this In [5]: pd.to_datetime((df.Y*10000+df.M*100+df.D).apply(str),format='%Y%m%d')  # in 0.13 the above or this will work In [5]: pd.to_datetime(df.Y*10000+df.M*100+df.D,format='%Y%m%d') Out[5]:  0   1990-05-06 00:00:00 1   1990-05-07 00:00:00 2   1990-05-08 00:00:00 3   1990-05-09 00:00:00 dtype: datetime64[ns] 
like image 21
Jeff Avatar answered Oct 03 '22 07:10

Jeff


In version 0.18.1 you can use to_datetime, but:

  • The names of the columns have to be year, month, day, hour, minute and second:
  • Minimal columns are year, month and day

Sample:

import pandas as pd  df = pd.DataFrame({'year': [2015, 2016],                    'month': [2, 3],                     'day': [4, 5],                     'hour': [2, 3],                     'minute': [10, 30],                     'second': [21,25]})  print df    day  hour  minute  month  second  year 0    4     2      10      2      21  2015 1    5     3      30      3      25  2016  print pd.to_datetime(df[['year', 'month', 'day']]) 0   2015-02-04 1   2016-03-05 dtype: datetime64[ns]  print pd.to_datetime(df[['year', 'month', 'day', 'hour']]) 0   2015-02-04 02:00:00 1   2016-03-05 03:00:00 dtype: datetime64[ns]  print pd.to_datetime(df[['year', 'month', 'day', 'hour', 'minute']]) 0   2015-02-04 02:10:00 1   2016-03-05 03:30:00 dtype: datetime64[ns]  print pd.to_datetime(df) 0   2015-02-04 02:10:21 1   2016-03-05 03:30:25 dtype: datetime64[ns] 

Another solution is convert to dictionary:

print df    M  D     Y  Apples  Oranges 0  5  6  1990      12        3 1  5  7  1990      14        4 2  5  8  1990      15       34 3  5  9  1990      23       21  print pd.to_datetime(dict(year=df.Y, month=df.M, day=df.D)) 0   1990-05-06 1   1990-05-07 2   1990-05-08 3   1990-05-09 dtype: datetime64[ns] 
like image 91
jezrael Avatar answered Oct 03 '22 06:10

jezrael