Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The fastest way to parse dates in Python when reading .csv file?

I have a .csv file that has 2 separate columns for 'Date' and ' Time'. I read the file like this:

data1 = pd.read_csv('filename.csv', parse_dates=['Date', 'Time'])

But it seems that only the ' Date' column is in time format while the 'Time' column is still string or in a format other than time format.

When I do the following:

data0 = pd.read_csv('filename.csv')
data0['Date'] = pd.to_datetime(data0['Date'])
data0['Time'] = pd.to_datetime(data0['Time'])

It gives a dataframe I want, but takes quite some time. So what's the fastest way to read in the file and convert the date and time from a string format?

The .csv file is like this:

              Date      Time      Open       High       Low     Close  
0       2004-04-12    8:31 AM  1139.870  1140.860  1139.870  1140.860       
1       2005-04-12   10:31 AM  1141.219  1141.960  1141.219  1141.960       
2       2006-04-12   12:33 PM  1142.069  1142.290  1142.069  1142.120       
3       2007-04-12    3:24 PM  1142.240  1143.140  1142.240  1143.140       
4       2008-04-12    5:32 PM  1143.350  1143.589  1143.350  1143.589       

Thanks!

like image 917
Cofeinnie Bonda Avatar asked Jul 28 '16 20:07

Cofeinnie Bonda


People also ask

Is Read_csv faster than Read_excel?

Python loads CSV files 100 times faster than Excel files. Use CSVs. Con: csv files are nearly always bigger than . xlsx files.

How do I separate a date and time in a CSV file in Python?

Select Text to Columns and choose Space for the Separated By field. By default, the Tab option will be enabled for the Separated By field, so you'll need to uncheck that after choosing Space. Choose the Collection Time column and then select Date (MDY) from the Column type drop-down. Once you're done, click OK.


1 Answers

Here, In your case 'Time' is in AM/PM format which take more time to parse.

You can add format to increase speed of to_datetime() method.

data0=pd.read_csv('filename.csv')

# %Y - year including the century
# %m - month (01 to 12)
# %d - day of the month (01 to 31)
data0['Date']=pd.to_datetime(data0['Date'], format="%Y/%m/%d")

# %I - hour, using a -hour clock (01 to 12)
# %M - minute
# %p - either am or pm according to the given time value
# data0['Time']=pd.to_datetime(data0['Time'], format="%I:%M %p") -> around 1 sec
data0['Time']=pd.datetools.to_time(data0['Time'], format="%I:%M %p")

For more methods info : Pandas Tools

For more format options check - datetime format directives.

For 500K rows it improved speed from around 60 seconds -> 0.01 seconds in my system.

You can also use :

# Combine date & time directly from string format
pd.Timestamp(data0['Date'][0] + " " + data0['Time'][0])
like image 194
RAVI Avatar answered Sep 30 '22 17:09

RAVI