I have a .csv file that has 2 separate columns for 'Date'
and ' Time'
. I read the file like this:
data1 = pd.read_csv('filename.csv', parse_dates=['Date', 'Time'])
But it seems that only the ' Date'
column is in time format while the 'Time'
column is still string or in a format other than time format.
When I do the following:
data0 = pd.read_csv('filename.csv')
data0['Date'] = pd.to_datetime(data0['Date'])
data0['Time'] = pd.to_datetime(data0['Time'])
It gives a dataframe I want, but takes quite some time. So what's the fastest way to read in the file and convert the date and time from a string format?
The .csv file is like this:
Date Time Open High Low Close
0 2004-04-12 8:31 AM 1139.870 1140.860 1139.870 1140.860
1 2005-04-12 10:31 AM 1141.219 1141.960 1141.219 1141.960
2 2006-04-12 12:33 PM 1142.069 1142.290 1142.069 1142.120
3 2007-04-12 3:24 PM 1142.240 1143.140 1142.240 1143.140
4 2008-04-12 5:32 PM 1143.350 1143.589 1143.350 1143.589
Thanks!
Python loads CSV files 100 times faster than Excel files. Use CSVs. Con: csv files are nearly always bigger than . xlsx files.
Select Text to Columns and choose Space for the Separated By field. By default, the Tab option will be enabled for the Separated By field, so you'll need to uncheck that after choosing Space. Choose the Collection Time column and then select Date (MDY) from the Column type drop-down. Once you're done, click OK.
Here, In your case 'Time' is in AM/PM format which take more time to parse.
You can add format to increase speed of to_datetime() method.
data0=pd.read_csv('filename.csv')
# %Y - year including the century
# %m - month (01 to 12)
# %d - day of the month (01 to 31)
data0['Date']=pd.to_datetime(data0['Date'], format="%Y/%m/%d")
# %I - hour, using a -hour clock (01 to 12)
# %M - minute
# %p - either am or pm according to the given time value
# data0['Time']=pd.to_datetime(data0['Time'], format="%I:%M %p") -> around 1 sec
data0['Time']=pd.datetools.to_time(data0['Time'], format="%I:%M %p")
For more methods info : Pandas Tools
For more format options check - datetime format directives.
For 500K rows it improved speed from around 60 seconds -> 0.01 seconds in my system.
You can also use :
# Combine date & time directly from string format
pd.Timestamp(data0['Date'][0] + " " + data0['Time'][0])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With