I'm exploring Pandas - trying to learn and apply it. Currently I have a csv file populated with a financial timeseries data of following structure:
date, time, open, high, low, close, volume
2003.04.08,12:00,1.06830,1.06960,1.06670,1.06690,446
2003.04.08,13:00,1.06700,1.06810,1.06570,1.06630,433
2003.04.08,14:00,1.06650,1.06810,1.06510,1.06670,473
2003.04.08,15:00,1.06670,1.06890,1.06630,1.06850,556
2003.04.08,16:00,1.06840,1.07050,1.06610,1.06680,615
Now I want to convert the csv data into a pandas DataFrame object, so that date and time fields merge and become the DateTimeIndex of the DataFrame like this:
df = pa.read_csv(path,
names = ['date', 'time', 'open', 'high', 'low', 'close', 'vol'],
parse_dates = {'dateTime': ['date', 'time']},
index_col = 'dateTime')
This works yielding a nice DataFrame object:
<class 'pandas.core.frame.DataFrame'>
Index: 8676 entries, 2003.04.08 12:00 to nan nan
Data columns (total 5 columns):
open 8675 non-null values
high 8675 non-null values
low 8675 non-null values
close 8675 non-null values
vol 8675 non-null values
dtypes: float64(5)
But upon inspection it turns out that the Index is not a DataTimeIndex but unicode strings instead:
type(df.index)
>>> pandas.core.index.Index
df.index
>>> Index([u'2003.04.08 12:00', u'2003.04.08 13:00', u'2003.04.08 14:00', ....
So read_csv
parsed the date and time fields, merged them but did not create a DateTimeIndex. As far as I understood from the documentation a new datastructure object supplied with a list of datetime objects should automatically create a DateTimeIndex. Am I wrong? Is the DataFrame object an exception?
I also tried to convert the current index like this:
df.index = pa.to_datetime(df.index)
but no changes have been made to the index and it is still in unicode format. I begin to suspect the default parsing functions aren't doing their job, but I don't get any error messages from them.
How to get a working DateTimeIndex in a DateFrame in this situation?
Solution:
df = pa.read_csv(path,
names = ['date', 'time', 'open', 'high', 'low', 'close', 'vol'],
parse_dates={'datetime':['date','time']},
keep_date_col = True,
index_col='datetime'
)
now apply the lambda function, doing what the parser should have done:
df['datetime'] = df.apply(lambda row: datetime.datetime.strptime(row['date']+ ':' + row['time'], '%Y.%m.%d:%H:%M'), axis=1)
DatetimeIndex [source] Immutable ndarray of datetime64 data, represented internally as int64, and which can be boxed to Timestamp objects that are subclasses of datetime and carry metadata such as frequency information.
For non-standard datetime parsing, use pd.to_datetime after pd.read_csv . To parse an index or column with a mixture of timezones, specify date_parser to be a partially-applied pandas.to_datetime() with utc=True . See Parsing a CSV with mixed timezones for more.
pandas supports converting integer or float epoch times to Timestamp and DatetimeIndex . The default unit is nanoseconds, since that is how Timestamp objects are stored internally.
Read a CSV File In this case, the Pandas read_csv() function returns a new DataFrame with the data and labels from the file data. csv , which you specified with the first argument. This string can be any valid path, including URLs.
Dateutil is unable to parse your data correctly but you can do it after loading like so using strptime
:
import datetime
df['DateTime'] = df.apply(lambda row: datetime.datetime.strptime(row['date']+ ':' + row['time'], '%Y.%m.%d:%H:%M'), axis=1)
This will yield you the 'DateTime' column as datetime64[ns]
and you can use it as your index
EDIT
Hmm.. interestingly when I do this it works:
df = pd.read_csv(r'c:\data\temp.txt', parse_dates={'datetime':['date','time']}, index_col='datetime')
Could you see what happens when you drop the column names from the parameters to read_csv
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With