I'm processing a Japanese csv file using Python3+pandas.
The Japanese csv has a column representing time, the format is like H29.12.1. I learnt that this format is a Japanese format, and H29.12.1 can be converted to 2017-12-1 (in YYYY-MM-DD format).
My question is, Does Python or pandas provide a function to convert this Japanese time column to YYYY-MM-DD format?
The current era is 平成 Heisei and began in 1989
With that information, we can read date, set the year to 1989 and add N-1, where N is the year number (right after H)
Here's an example function:
import datetime as dt
def parse_heisei(date_string, sep='.'):
y, m, d = date_string.split(sep)
return dt.date(year=1989 + int(y[1:]) - 1, month=int(m), day=int(d))
you can then apply this function to your dataframe's date column.
example:
my_gregorian_dates = df.heisei_dates.apply(parse_hesei)
I'm sure you can also find a library that does this automatically, but I don't think the standard datetime module or pandas comes with this built in. Regardless, the function is pretty simple to write.
I don't think there is a pandas function to handle Japanese Imperial Calendar, you may need to write your own function to convert the dates.
import re
import pandas as pd
def jp_date_to_yyyymmdd(dt):
if re.match(r'\w\d+\.\d+.\d+', dt) is None:
return None
elif dt[0] == 'H':
# HEISEI - 1989-01-08
ymd = [int(x) for x in re.split(r'\.', dt[1:])]
return pd.datetime(1988 + ymd[0], ymd[1], ymd[2])
elif dt[1] == 'S':
# SHOWA - 1926-12-25
ymd = [int(x) for x in re.split(r'\.', dt[1:])]
return pd.datetime(1925 + ymd[0], ymd[1], ymd[2])
else:
# You may add more conditions to handle older dates
return None
df = pd.DataFrame({'jp_date': ['H29.12.1', 'H20.12.22', '']})
df.jp_date.apply(jp_date_to_yyyymmdd)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With