I have a dataframe of 13 columns and 55,000 rows I am trying to convert 5 of those rows to datetime, right now they are returning the type 'object' and I need to transform this data for machine learning I know that if I do
data['birth_date'] = pd.to_datetime(data[birth_date], errors ='coerce')
it will return a datetime column but I want to do it for 4 other columns as well, is there one line that I can write to call all of them? I dont think I can index like
data[:,7:12]
thanks!
pandas. to_datetime() method is used to change String/Object time to date type (datetime64[ns]).
If True and parse_dates is enabled, pandas will attempt to infer the format of the datetime strings in the columns, and if it can be inferred, switch to a faster method of parsing them.
You can use apply to iterate through each column using pd.to_datetime
data.iloc[:, 7:12] = data.iloc[:, 7:12].apply(pd.to_datetime, errors='coerce')  As part of the changes in pandas 1.3.0, iloc/loc will no longer update the column dtype on assignment. Use column labels directly instead:
cols = data.columns[7:12] data[cols] = data[cols].apply(pd.to_datetime, errors='coerce') 
                        my_df[['column1','column2']] =      my_df[['column1','column2']].apply(pd.to_datetime, format='%Y-%m-%d %H:%M:%S.%f')   Note: of course the format can be changed as required.
If performance is a concern I would advice to use the following function to convert those columns to date_time:
def lookup(s):
    """
    This is an extremely fast approach to datetime parsing.
    For large data, the same dates are often repeated. Rather than
    re-parse these, we store all unique dates, parse them, and
    use a lookup to convert all dates.
    """
    dates = {date:pd.to_datetime(date) for date in s.unique()}
    return s.apply(lambda v: dates[v])
to_datetime: 5799 ms
dateutil:    5162 ms
strptime:    1651 ms
manual:       242 ms
lookup:        32 ms
Source: https://github.com/sanand0/benchmarks/tree/master/date-parse
If you rather want to convert at load time, you could do something like this
date_columns = ['c1','c2', 'c3', 'c4', 'c5']
data = pd.read_csv('file_to_read.csv', parse_dates=date_columns)
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With