Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting a DateTime Index value to an Index Number

If I have a dataframe that has a datetime index and I get the first valid index by using series.first_valid_index - It returns a the date time of the first non nan which is what I'm looking for however:

Is there a way to get the index number that the datetime value corresponds to. For example, it returns 2018-07-16 but I'd like to know that's the 18th row of the dataframe?

If not, is there a way to count the rows from the beginning of the dataframe to that index value?

like image 491
novawaly Avatar asked Sep 27 '18 17:09

novawaly


People also ask

How do you make a datetime column An index?

To get a new datetime column and set it as DatetimeIndex we can use the format parameter of the to_datetime function followed by the set_index function. The output above shows our DataFrame with DatetimeIndex. That's it!

What is DateTimeIndex in Python?

DatetimeIndex [source] Immutable ndarray of datetime64 data, represented internally as int64, and which can be boxed to Timestamp objects that are subclasses of datetime and carry metadata such as frequency information.

What kind of index does a time series DataFrame have?

A time series is just a pandas DataFrame or Series that has a time based index. The values in the time series can be anything else that can be contained in the containers, they are just accessed using date or time values.


2 Answers

TLDR: If you're asking for a way to map a given index value (in this case a DatetimeIndex) to its integer equivalent, you are asking for get_loc, if you just want to find the integer index from the Series, use argmax with the underlying numpy array.

Setup

np.random.seed(3483203)

df = pd.DataFrame(
    np.random.choice([0, np.nan], 5),
    index=pd.date_range(start='2018-01-01', freq='1D', periods=5)
)

              0
2018-01-01  NaN
2018-01-02  NaN
2018-01-03  0.0
2018-01-04  NaN
2018-01-05  NaN

Use pandas.Index.get_loc here, which is a general function to return an integer index for a given label:

>>> idx = df[0].first_valid_index()
>>> idx
Timestamp('2018-01-03 00:00:00', freq='D')
>>> df.index.get_loc(idx)
2

If you want to avoid finding the datetime index at all, you may use argmax on the underlying numpy array:

>>> np.argmax(~np.isnan(df[0].values))
2
like image 159
user3483203 Avatar answered Sep 29 '22 18:09

user3483203


I would try following (untested):

x = len(df)
num_index = range(0,x,1)
df =  df.reset_index()
df = df.set_index(num_index)
like image 36
Maeaex1 Avatar answered Sep 29 '22 18:09

Maeaex1