Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Arithmetic operations on datetime index in pandas

In pandas, you can access specific positions of a time series either by classical integer position / row based indexing, or by datetime based indexing. The integer based index can be manipulated using basic arithmetic operations, e.g. if I have a integer_index for a time series with frequency 12 hours and I want to access the entry exactly one day prior to this, I can simply do integer_index - 2. However, real world data are not always perfect, and sometimes rows are missing. In this case, this method fails, and it would be helpful to be able to use datetime based indexing and subtract, for example, one day from this index. How can I do this?

Sample script:

# generate a sample time series
import pandas as pd
s = pd.Series(["A", "B", "C", "D", "E"], index=pd.date_range("2000-01-01", periods=5, freq="12h"))
print s

2000-01-01 00:00:00    A
2000-01-01 12:00:00    B
2000-01-02 00:00:00    C
2000-01-02 12:00:00    D
2000-01-03 00:00:00    E
Freq: 12H, dtype: object

# these to indices should access the same value ("C")
integer_index = 2
date_index = "2000-01-02 00:00"

print s[integer_index]  # prints "C"
print s[date_index]  # prints "C"

# I can access the value one day earlier by subtracting 2 from the integer index
print s[integer_index - 2]  # prints A

# how can I subtract one day from the date index?
print s[date_index - 1]  # raises an error 

The background to this question can be found in an earlier submission of mine here:

Fill data gaps with average of data from adjacent days

where user JohnE found a workaround to my problem that uses integer position based indexing. He makes sure that I have equally spaced data by resampling the time series.

like image 203
Fred S Avatar asked Sep 19 '14 08:09

Fred S


People also ask

What is pandas datetime index?

DatetimeIndex. class pandas. DatetimeIndex [source] Immutable ndarray of datetime64 data, represented internally as int64, and which can be boxed to Timestamp objects that are subclasses of datetime and carry metadata such as frequency information.

How do pandas deal with date time?

Pandas has a built-in function called to_datetime()that converts date and time in string format to a DateTime object. As you can see, the 'date' column in the DataFrame is currently of a string-type object. Thus, to_datetime() converts the column to a series of the appropriate datetime64 dtype.


1 Answers

Your datetime index isn't based on strings, it's a DatetimeIndex meaning you can use datetime objects to index appropriately, rather than a string which looks like a date.

The code below converts date_index into a datetime object and then uses timedelta(days=1) to subtract "one day" away from it.

# generate a sample time series
import pandas as pd
from datetime import datetime, timedelta

s = pd.Series(["A", "B", "C", "D", "E"], index=pd.date_range("2000-01-01", periods=5, freq="12h"))
print(s)

# these two indices should access the same value ("C")
integer_index = 2
# Converts the string into a datetime object
date_index = datetime.strptime("2000-01-02 00:00", "%Y-%m-%d %H:%M")
print(date_index) # 2000-01-02 00:00:00

print(s[integer_index])  # prints "C"
print(s[date_index])  # prints "C"


print(s[integer_index - 2])  # prints "A"

one_day = timedelta(days=1)
print(s[date_index - one_day]) # prints "A"
print(date_index - one_day) # 2000-01-01 00:00:00
like image 108
Ffisegydd Avatar answered Sep 24 '22 13:09

Ffisegydd