Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

change multiple columns in pandas dataframe to datetime

I have a dataframe of 13 columns and 55,000 rows I am trying to convert 5 of those rows to datetime, right now they are returning the type 'object' and I need to transform this data for machine learning I know that if I do

data['birth_date'] = pd.to_datetime(data[birth_date], errors ='coerce')

it will return a datetime column but I want to do it for 4 other columns as well, is there one line that I can write to call all of them? I dont think I can index like

data[:,7:12]

thanks!

like image 417
kwashington122 Avatar asked Jan 06 '17 20:01

kwashington122


People also ask

What is the method in pandas to change values into datetime data type?

pandas. to_datetime() method is used to change String/Object time to date type (datetime64[ns]).

What does Parse_dates do in pandas?

If True and parse_dates is enabled, pandas will attempt to infer the format of the datetime strings in the columns, and if it can be inferred, switch to a faster method of parsing them.


4 Answers

You can use apply to iterate through each column using pd.to_datetime

data.iloc[:, 7:12] = data.iloc[:, 7:12].apply(pd.to_datetime, errors='coerce') 

As part of the changes in pandas 1.3.0, iloc/loc will no longer update the column dtype on assignment. Use column labels directly instead:

cols = data.columns[7:12] data[cols] = data[cols].apply(pd.to_datetime, errors='coerce') 
like image 174
Ted Petrou Avatar answered Oct 31 '22 04:10

Ted Petrou


my_df[['column1','column2']] =      my_df[['column1','column2']].apply(pd.to_datetime, format='%Y-%m-%d %H:%M:%S.%f') 

Note: of course the format can be changed as required.

like image 35
mel el Avatar answered Oct 31 '22 03:10

mel el


If performance is a concern I would advice to use the following function to convert those columns to date_time:

def lookup(s):
    """
    This is an extremely fast approach to datetime parsing.
    For large data, the same dates are often repeated. Rather than
    re-parse these, we store all unique dates, parse them, and
    use a lookup to convert all dates.
    """
    dates = {date:pd.to_datetime(date) for date in s.unique()}
    return s.apply(lambda v: dates[v])

to_datetime: 5799 ms
dateutil:    5162 ms
strptime:    1651 ms
manual:       242 ms
lookup:        32 ms

Source: https://github.com/sanand0/benchmarks/tree/master/date-parse

like image 38
SerialDev Avatar answered Oct 31 '22 03:10

SerialDev


If you rather want to convert at load time, you could do something like this

date_columns = ['c1','c2', 'c3', 'c4', 'c5']
data = pd.read_csv('file_to_read.csv', parse_dates=date_columns)
like image 24
smishra Avatar answered Oct 31 '22 02:10

smishra