I have a pandas.Series
with a two level pandas.MultiIndex
. The first level is of dates. I have another DatetimeIndex
with values that are close to some of the dates in my series.index.levels[0]
. I want to reindex my series with dates in the "other" DatetimeIndex
that are close enough to existing dates in the index. Suppose that by "close" I mean within 2 days.
import pandas as pd
import numpy as np
np.random.seed([3, 1415])
chars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
# Equal Date + 3 Days - 1 Day + 2 Days
i0 = pd.to_datetime(
[ '2018-11-30', '2018-12-16', '2018-12-30', '2019-01-17' ])
i1 = pd.to_datetime(
['2018-10-31', '2018-11-30', '2018-12-13', '2018-12-31', '2019-01-15', '2019-01-31'])
# Include Skip Include Include
lvl0 = i0.repeat(5)
lvl1 = np.concatenate(
[np.random.choice([*chars], size=5, replace=False) for _ in range(4)])
midx = pd.MultiIndex.from_tuples([*zip(lvl0, lvl1)], names=['date', 'ID'])
s0 = pd.Series(np.arange(4).repeat(5), midx, name='stuff')
s0
date ID
2018-11-30 S 0
O 0
J 0
H 0
D 0
2018-12-16 Q 1
B 1
A 1
S 1
P 1
2018-12-30 U 2
S 2
A 2
J 2
L 2
2019-01-17 K 3
U 3
V 3
S 3
H 3
Name: stuff, dtype: int64
Note: The same dtype
as the original
date ID
2018-11-30 S 0
O 0
J 0
H 0
D 0
2018-12-31 U 2
S 2
A 2
J 2
L 2
2019-01-15 K 3
U 3
V 3
S 3
H 3
Name: stuff, dtype: int64
tol = pd.Timedelta('2D')
# 0. This should be the same as the `i0` I used to set up
# But supposing that wasn't available, we would...
i0 = s0.index.levels[0]
# 1. Broadcast date differences
# 2. Take the absolute value
# 3. Find the position of minimum absolute value for each row
# 4. Define a proposal of new index level values with those positions
i_proposal = i1[np.abs(np.subtract.outer(i0, i1)).argmin(1)]
# 5. Use proposal to get which ones are within the
# tolerance of 2 days
i_final = i_proposal[np.abs(i_proposal - i0) <= tol]
# 6. set_levels with proposal.
# because at this point there is a one-to-one correspondance
s0.index.set_levels(i_proposal, level=0, inplace=True)
# 7. use `loc` to pull out the final ones
s0.loc[i_final]
date ID
2018-11-30 S 0
O 0
J 0
H 0
D 0
2018-12-31 U 2
S 2
A 2
J 2
L 2
2019-01-15 K 3
U 3
V 3
S 3
H 3
Name: stuff, dtype: int64
inplace
on i0.index
len(i0)
* len(i1)
). There should be a Big-O(len(i0)
+ len(i1)
) solution.Can anyone think a better way to do this?
This is very close to what cs95 did by using reindex
s,y=i1.reindex(s0.index.levels[0],tolerance=pd.Timedelta(days=2),method='nearest')
s0.loc[s[y!=-1]]
If need change the index level1 to l1
s=s0.index.levels[0].values
t=abs((i1[:,None]-s))/np.timedelta64(1, 'D')<=2
f=s0.loc[s[t.any(0)]].reset_index(level=1)
f.index=f.index.map(dict(zip(s[t.any(0)],i1[t.any(1)])))
f.set_index('ID',append=True,inplace=True)
f
Out[458]:
stuff
date ID
2018-11-30 S 0
O 0
J 0
H 0
D 0
2018-12-31 U 2
S 2
A 2
J 2
L 2
2019-01-15 K 3
U 3
V 3
S 3
H 3
I reconfigured in this way
lvl0, lvl1 = s0.index.levels
_, indexer = i1.reindex(lvl0, tolerance=tol, method='nearest')
newlvl0 = i1[indexer]
msklvl0 = newlvl0[indexer != -1]
newidx = s0.index.set_levels([newlvl0, lvl1])
s0.set_axis(newidx, inplace=False).loc[msklvl0]
date ID
2018-11-30 S 0
O 0
J 0
H 0
D 0
2018-12-31 U 2
S 2
A 2
J 2
L 2
2019-01-15 K 3
U 3
V 3
S 3
H 3
Name: stuff, dtype: int64
This is a merge_asof
problem. I'd do it like this:
res = pd.merge_asof(
s0.to_frame(), # should be first, simulate how='left'
i1.to_frame(), # should be second
tolerance=pd.Timedelta(days=2), # two days tolerance
left_on='date', # select index level for s0
right_index=True,
direction='nearest') # default is 'backward', not as useful
s0[res[0].notna()]
date ID
2018-11-30 S 0
O 0
J 0
H 0
D 0
2018-12-30 U 2
S 2
A 2
J 2
L 2
2019-01-17 K 3
U 3
V 3
S 3
H 3
Name: stuff, dtype: int64
Note that this will retain the indices from s0
(which may not be what you want).
This got me what I wanted
tol = pd.Timedelta(days=2)
right = pd.DataFrame(dict(newdate=i1), i1)
left = s0.to_frame()
kw = dict(
left=left, right=right, tolerance=tol,
left_on='date', right_index=True, direction='nearest'
)
res = pd.merge_asof(**kw)
res = res.dropna() \
.reset_index() \
.set_index(['newdate', 'ID']) \
.stuff.rename_axis(['date', 'ID'])
res
date ID
2018-11-30 S 0
O 0
J 0
H 0
D 0
2018-12-31 U 2
S 2
A 2
J 2
L 2
2019-01-15 K 3
U 3
V 3
S 3
H 3
Name: stuff, dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With