I have a dataframe of 13 columns and 55,000 rows I am trying to convert 5 of those rows to datetime, right now they are returning the type 'object' and I need to transform this data for machine learning I know that if I do <pre class="prettyprint"><code>data['birth_date'] = pd.to_datetime(data[birth_date], errors ='coerce') </code></pre> it will return a datetime column but I want to do it for 4 other columns as well, is there one line that I can write to call all of them? I dont think I can index like <pre class="prettyprint"><code>data[:,7:12] </code></pre> thanks!

You can use <code>apply</code> to iterate through each column using <code>pd.to_datetime</code> <pre class="prettyprint"><code>data.iloc[:, 7:12] = data.iloc[:, 7:12].apply(pd.to_datetime, errors='coerce') </code></pre> As part of the changes in pandas 1.3.0, <code>iloc</code>/<code>loc</code> will no longer update the column dtype on assignment. Use column labels directly instead: <pre class="prettyprint"><code>cols = data.columns[7:12] data[cols] = data[cols].apply(pd.to_datetime, errors='coerce') </code></pre>

<pre class="prettyprint"><code>my_df[['column1','column2']] = my_df[['column1','column2']].apply(pd.to_datetime, format='%Y-%m-%d %H:%M:%S.%f') </code></pre> Note: of course the format can be changed as required.

If you rather want to convert at load time, you could do something like this <pre class="prettyprint"><code>date_columns = ['c1','c2', 'c3', 'c4', 'c5'] data = pd.read_csv('file_to_read.csv', parse_dates=date_columns) </code></pre>

change multiple columns in pandas dataframe to datetime

Tags:

python

datetime

pandas

I have a dataframe of 13 columns and 55,000 rows I am trying to convert 5 of those rows to datetime, right now they are returning the type 'object' and I need to transform this data for machine learning I know that if I do

data['birth_date'] = pd.to_datetime(data[birth_date], errors ='coerce')

it will return a datetime column but I want to do it for 4 other columns as well, is there one line that I can write to call all of them? I dont think I can index like

data[:,7:12]

thanks!

417

asked Jan 06 '17 20:01

kwashington122

4 Answers

You can use apply to iterate through each column using pd.to_datetime

data.iloc[:, 7:12] = data.iloc[:, 7:12].apply(pd.to_datetime, errors='coerce')

As part of the changes in pandas 1.3.0, iloc/loc will no longer update the column dtype on assignment. Use column labels directly instead:

cols = data.columns[7:12] data[cols] = data[cols].apply(pd.to_datetime, errors='coerce')

174

answered Oct 31 '22 04:10

Ted Petrou

my_df[['column1','column2']] =      my_df[['column1','column2']].apply(pd.to_datetime, format='%Y-%m-%d %H:%M:%S.%f')

Note: of course the format can be changed as required.

answered Oct 31 '22 03:10

mel el

If performance is a concern I would advice to use the following function to convert those columns to date_time:

def lookup(s):
    """
    This is an extremely fast approach to datetime parsing.
    For large data, the same dates are often repeated. Rather than
    re-parse these, we store all unique dates, parse them, and
    use a lookup to convert all dates.
    """
    dates = {date:pd.to_datetime(date) for date in s.unique()}
    return s.apply(lambda v: dates[v])

to_datetime: 5799 ms
dateutil:    5162 ms
strptime:    1651 ms
manual:       242 ms
lookup:        32 ms

Source: https://github.com/sanand0/benchmarks/tree/master/date-parse

answered Oct 31 '22 03:10

SerialDev

If you rather want to convert at load time, you could do something like this

date_columns = ['c1','c2', 'c3', 'c4', 'c5']
data = pd.read_csv('file_to_read.csv', parse_dates=date_columns)

answered Oct 31 '22 02:10

smishra

Related questions
                            
                                Replacing values greater than a number in pandas dataframe
                            
                                Postgresql DROP TABLE doesn't work
                            
                                Equation parsing in Python
                            
                                Lazy module variables--can it be done?
                            
                                Turtle graphics - How do I control when the window closes?
                            
                                Python pandas, Plotting options for multiple lines
                            
                                Does Python do variable interpolation similar to "string #{var}" in Ruby?
                            
                                Why is '' > 0 True in Python 2?
                            
                                Programmatically getting an access token for using the Facebook Graph API
                            
                                python elasticsearch client set mappings during create index
                            
                                How do I plot only a table in Matplotlib?
                            
                                Case insensitive urls for Django?
                            
                                How to show a PDF file in a Django view?
                            
                                Adjusting gridlines and ticks in matplotlib imshow
                            
                                How can I create an empty n*m PNG file in Python?
                            
                                AWS Content Type Settings in S3 Using Boto3
                            
                                RMSE/ RMSLE loss function in Keras
                            
                                Django - TypeError - save() got an unexpected keyword argument 'force_insert'
                            
                                How can I decode a SSL certificate using python?
                            
                                Python Threading inside a class

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With