I have data sets containing the date (Julian day, column 1), hour (HHMM, column 2) and seconds (column 3) in individual columns:
1 253 2300 0 2.9 114.4 18.42 21.17
1 253 2300 10 3.27 111.2 18.48 21.12
1 253 2300 20 3.22 111.3 18.49 21.09
1 253 2300 30 3.84 106.4 18.52 21
1 253 2300 40 3.75 104.4 18.53 20.85
I'm reading the text file using Pandas
as:
columns = ['station','julian_day','hours','seconds','U','Ud','T','RH']
df = pd.read_table(file_name, header=None, names=columns, delim_whitespace=True)
Now I want to convert the date to something more convenient like YYYY-MM-DD HH:MM:SS
(The year isn't provided in the data set, but is fixed at 2001).
I tried combining the three columns into one using parse_dates
:
df = pd.read_table(file_name, header=None, names=columns, delim_whitespace=True,
parse_dates={'datetime' : ['julian_day','hours','seconds']})
which converts the three columns into one string:
In [38]: df['datetime'][0]
Out[38]: '253 2300 0'
I next tried to convert them using date_parser
; following this post using something like:
date_parser = lambda x: datetime.datetime.strptime(x, '%j %H%M %s')
The date_parser
itself works, but I can't get this to combine with read_table
, and I'm pretty much stuck at this point. Is there an easy way to achieve the conversion?
The full minimal (not-so) working example:
import pandas as pd
import datetime
from io import StringIO
data_file = StringIO("""\
1 253 2300 0 2.9 114.4 18.42 21.17
1 253 2300 10 3.27 111.2 18.48 21.12
1 253 2300 20 3.22 111.3 18.49 21.09
1 253 2300 30 3.84 106.4 18.52 21
1 253 2300 40 3.75 104.4 18.53 20.85
""")
date_parser = lambda x: datetime.datetime.strptime(x, '%j %H%M %S')
columns = ['station','julian_day','hours','seconds','U','Ud','T','RH']
df = pd.read_table(data_file, header=None, names=columns, delim_whitespace=True,\
parse_dates={'datetime' : ['julian_day','hours','seconds']})
Use astype() function to convert the string column to datetime data type in pandas DataFrame. The data type of the DateTime isdatetime64[ns] ; should be given as the parameter. Yields same output as above.
The date-time default format is “YYYY-MM-DD”. Hence, December 8, 2020, in the date format will be presented as “2020-12-08”. The datetime format can be changed and by changing we mean changing the sequence and style of the format.
Code #1 : Convert Pandas dataframe column type from string to datetime format using pd. to_datetime() function.
Not sure if I am missing something but this seems to work:
import pandas as pd
import datetime
from io import StringIO
data_file = StringIO("""\
1 253 2300 0 2.9 114.4 18.42 21.17
1 253 2300 10 3.27 111.2 18.48 21.12
1 253 2300 20 3.22 111.3 18.49 21.09
1 253 2300 30 3.84 106.4 18.52 21
1 253 2300 40 3.75 104.4 18.53 20.85
""")
date_parser = lambda x: datetime.datetime.strptime(("2001 " + x), '%Y %j %H%M %S')
columns = ['station','julian_day','hours','seconds','U','Ud','T','RH']
df = pd.read_table(data_file, header=None, names=columns, delim_whitespace=True,\
date_parser = date_parser,parse_dates={'datetime' : ['julian_day','hours','seconds']})
I just add the date_parser parameter in read_table and hard codded 2001 in parsing function.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With