I have the following dataframe. It is OHLC one-minute data. Obviously I need the T column to become and index in order to use time-series functionallity
C H L O T V
13712 6873.0 6873.0 6873.0 6873.0 2018-01-13T17:17:00 799.448421 
13713 6878.0 6878.0 6875.0 6875.0 2018-01-13T17:18:00 1707.578666 
13714 6880.0 6880.0 6825.0 6825.0 2018-01-13T17:21:00 481.245707 
13715 6876.0 6876.0 6876.0 6876.0 2018-01-13T17:22:00 839.177283 
13716 6870.0 6878.0 6830.0 6878.0 2018-01-13T17:23:00 4336.830277 
I used:
df['T'] = pd.to_datetime(df['T'])
So far so good! The T column is now recognised as a date
Check:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 13717 entries, 1970-01-01 00:00:00 to 1970-01-01 00:00:00.000013716
Data columns (total 7 columns):
BV    13717 non-null float64
C     13717 non-null float64
H     13717 non-null float64
L     13717 non-null float64
O     13717 non-null float64
T     13717 non-null datetime64[ns]
V     13717 non-null float64
dtypes: datetime64[ns](1), float64(6)
memory usage: 857.3 KB
And now comes the fun and unexplainable part:
df.set_index(df['T'])
   C H L O T V
T
2018-01-03 17:27:00 5710.0 5710.0 5663.0 5667.0 2018-01-03 17:27:00 3863.030204 
2018-01-03 17:28:00 5704.0 5710.0 5663.0 5710.0 2018-01-03 17:28:00 1208.627542 
2018-01-03 17:29:00 5699.0 5699.0 5663.0 5663.0 2018-01-03 17:29:00 1755.123688 
Still looks good, but when I check the type of index I get:
RangeIndex(start=0, stop=13717, step=1)
And now if I try:
df.index = pd.to_datetime(df.index)
I end up with:
DatetimeIndex([          '1970-01-01 00:00:00',
               '1970-01-01 00:00:00.000000001',
               '1970-01-01 00:00:00.000000002',
               '1970-01-01 00:00:00.000000003',
               '1970-01-01 00:00:00.000000004' and so on...
which is evidently wrong.
The questions are: 1. Why don't I get the normal DateTimeIndex if I am converting a date to index?
Thanks!
To convert the index of a DataFrame to DatetimeIndex , use Pandas' to_datetime(~) method.
DatetimeIndex [source] Immutable ndarray of datetime64 data, represented internally as int64, and which can be boxed to Timestamp objects that are subclasses of datetime and carry metadata such as frequency information.
If input data are csv the simpliest is use parameters parse_dates and index_col in read_csv:
df = pd.read_csv(file, parse_dates=['T'], index_col=['T'])
If not, then use your solution, don't forget assign back output of set_index and if need drop column T also after DatetimeIndex use T instead df['T']:
df['T'] = pd.to_datetime('T')
df = df.set_index('T')
#alternative solution
#df.set_index('T', inplace=True)
Why don't I get the normal DateTimeIndex if I am converting a date to index?
Because your index is default (0,1,2..), so df.index = pd.to_datetime(df.index) parse integerss like ns and get weird datetimes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With