Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert date/time columns in Pandas dataframe

Tags:

I have data sets containing the date (Julian day, column 1), hour (HHMM, column 2) and seconds (column 3) in individual columns:

1  253  2300   0    2.9  114.4  18.42  21.17
1  253  2300  10   3.27  111.2  18.48  21.12
1  253  2300  20   3.22  111.3  18.49  21.09
1  253  2300  30   3.84  106.4  18.52     21
1  253  2300  40   3.75  104.4  18.53  20.85

I'm reading the text file using Pandas as:

columns = ['station','julian_day','hours','seconds','U','Ud','T','RH']
df = pd.read_table(file_name, header=None, names=columns, delim_whitespace=True)

Now I want to convert the date to something more convenient like YYYY-MM-DD HH:MM:SS (The year isn't provided in the data set, but is fixed at 2001).

I tried combining the three columns into one using parse_dates:

df = pd.read_table(file_name, header=None, names=columns, delim_whitespace=True, 
                   parse_dates={'datetime' : ['julian_day','hours','seconds']}) 

which converts the three columns into one string:

In [38]: df['datetime'][0]
Out[38]: '253 2300 0'

I next tried to convert them using date_parser; following this post using something like:

date_parser = lambda x: datetime.datetime.strptime(x, '%j %H%M %s')

The date_parser itself works, but I can't get this to combine with read_table, and I'm pretty much stuck at this point. Is there an easy way to achieve the conversion?

The full minimal (not-so) working example:

import pandas as pd
import datetime
from io import StringIO

data_file = StringIO("""\
 1  253  2300   0    2.9  114.4  18.42  21.17
 1  253  2300  10   3.27  111.2  18.48  21.12
 1  253  2300  20   3.22  111.3  18.49  21.09
 1  253  2300  30   3.84  106.4  18.52     21
 1  253  2300  40   3.75  104.4  18.53  20.85
""")

date_parser = lambda x: datetime.datetime.strptime(x, '%j %H%M %S')

columns = ['station','julian_day','hours','seconds','U','Ud','T','RH']
df = pd.read_table(data_file, header=None, names=columns, delim_whitespace=True,\
                   parse_dates={'datetime' : ['julian_day','hours','seconds']})
like image 802
Bart Avatar asked Sep 02 '16 08:09

Bart


People also ask

How can I convert column to datetime in pandas?

Use astype() function to convert the string column to datetime data type in pandas DataFrame. The data type of the DateTime isdatetime64[ns] ; should be given as the parameter. Yields same output as above.

How do I change time format in pandas?

The date-time default format is “YYYY-MM-DD”. Hence, December 8, 2020, in the date format will be presented as “2020-12-08”. The datetime format can be changed and by changing we mean changing the sequence and style of the format.

What is the method in pandas to change values into date time data type?

Code #1 : Convert Pandas dataframe column type from string to datetime format using pd. to_datetime() function.


1 Answers

Not sure if I am missing something but this seems to work:

import pandas as pd
import datetime
from io import StringIO

data_file = StringIO("""\
 1  253  2300   0    2.9  114.4  18.42  21.17
 1  253  2300  10   3.27  111.2  18.48  21.12
 1  253  2300  20   3.22  111.3  18.49  21.09
 1  253  2300  30   3.84  106.4  18.52     21
 1  253  2300  40   3.75  104.4  18.53  20.85
""")

date_parser = lambda x: datetime.datetime.strptime(("2001 " + x), '%Y %j %H%M %S')

columns = ['station','julian_day','hours','seconds','U','Ud','T','RH']
df = pd.read_table(data_file, header=None, names=columns, delim_whitespace=True,\
                   date_parser = date_parser,parse_dates={'datetime' : ['julian_day','hours','seconds']})

I just add the date_parser parameter in read_table and hard codded 2001 in parsing function.

like image 69
milos.ai Avatar answered Oct 10 '22 23:10

milos.ai