I have the following dataframe. It is OHLC one-minute data. Obviously I need the T column to become and index in order to use time-series functionallity
C H L O T V
13712 6873.0 6873.0 6873.0 6873.0 2018-01-13T17:17:00 799.448421
13713 6878.0 6878.0 6875.0 6875.0 2018-01-13T17:18:00 1707.578666
13714 6880.0 6880.0 6825.0 6825.0 2018-01-13T17:21:00 481.245707
13715 6876.0 6876.0 6876.0 6876.0 2018-01-13T17:22:00 839.177283
13716 6870.0 6878.0 6830.0 6878.0 2018-01-13T17:23:00 4336.830277
I used:
df['T'] = pd.to_datetime(df['T'])
So far so good! The T column is now recognised as a date
Check:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 13717 entries, 1970-01-01 00:00:00 to 1970-01-01 00:00:00.000013716
Data columns (total 7 columns):
BV 13717 non-null float64
C 13717 non-null float64
H 13717 non-null float64
L 13717 non-null float64
O 13717 non-null float64
T 13717 non-null datetime64[ns]
V 13717 non-null float64
dtypes: datetime64[ns](1), float64(6)
memory usage: 857.3 KB
And now comes the fun and unexplainable part:
df.set_index(df['T'])
C H L O T V
T
2018-01-03 17:27:00 5710.0 5710.0 5663.0 5667.0 2018-01-03 17:27:00 3863.030204
2018-01-03 17:28:00 5704.0 5710.0 5663.0 5710.0 2018-01-03 17:28:00 1208.627542
2018-01-03 17:29:00 5699.0 5699.0 5663.0 5663.0 2018-01-03 17:29:00 1755.123688
Still looks good, but when I check the type of index I get:
RangeIndex(start=0, stop=13717, step=1)
And now if I try:
df.index = pd.to_datetime(df.index)
I end up with:
DatetimeIndex([ '1970-01-01 00:00:00',
'1970-01-01 00:00:00.000000001',
'1970-01-01 00:00:00.000000002',
'1970-01-01 00:00:00.000000003',
'1970-01-01 00:00:00.000000004' and so on...
which is evidently wrong.
The questions are: 1. Why don't I get the normal DateTimeIndex if I am converting a date to index?
Thanks!
To convert the index of a DataFrame to DatetimeIndex , use Pandas' to_datetime(~) method.
DatetimeIndex [source] Immutable ndarray of datetime64 data, represented internally as int64, and which can be boxed to Timestamp objects that are subclasses of datetime and carry metadata such as frequency information.
If input data are csv
the simpliest is use parameters parse_dates
and index_col
in read_csv
:
df = pd.read_csv(file, parse_dates=['T'], index_col=['T'])
If not, then use your solution, don't forget assign back output of set_index
and if need drop column T
also after DatetimeIndex
use T
instead df['T']
:
df['T'] = pd.to_datetime('T')
df = df.set_index('T')
#alternative solution
#df.set_index('T', inplace=True)
Why don't I get the normal DateTimeIndex if I am converting a date to index?
Because your index is default (0,1,2..
), so df.index = pd.to_datetime(df.index)
parse integers
s like ns
and get weird datetimes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With