Is there a function in numpy to guarantee or rather fix an array such that it is (nonstrictly) increasing along one particular axis? For example, I have the following 2D array:
X = array([[1, 2, 1, 4, 5],
[0, 3, 1, 5, 4]])
the output of np.foobar(X)
should return
array([[1, 2, 2, 4, 5],
[0, 3, 3, 5, 5]])
Does foobar
exist or do I need to do that manually by using something like np.diff
and some smart indexing?
Use np.maximum.accumulate
for a running (accumulated) max value along that axis to ensure the strictly increasing criteria -
np.maximum.accumulate(X,axis=1)
Sample run -
In [233]: X
Out[233]:
array([[1, 2, 1, 4, 5],
[0, 3, 1, 5, 4]])
In [234]: np.maximum.accumulate(X,axis=1)
Out[234]:
array([[1, 2, 2, 4, 5],
[0, 3, 3, 5, 5]])
For memory efficiency, we can assign it back to the input for in-situ changes with its out
argument.
Runtime tests
Case #1 : Array as input
In [254]: X = np.random.rand(1000,1000)
In [255]: %timeit np.maximum.accumulate(X,axis=1)
1000 loops, best of 3: 1.69 ms per loop
# @cᴏʟᴅsᴘᴇᴇᴅ's pandas soln using df.cummax
In [256]: %timeit pd.DataFrame(X).cummax(axis=1).values
100 loops, best of 3: 4.81 ms per loop
Case #2 : Dataframe as input
In [257]: df = pd.DataFrame(np.random.rand(1000,1000))
In [258]: %timeit np.maximum.accumulate(df.values,axis=1)
1000 loops, best of 3: 1.68 ms per loop
# @cᴏʟᴅsᴘᴇᴇᴅ's pandas soln using df.cummax
In [259]: %timeit df.cummax(axis=1)
100 loops, best of 3: 4.68 ms per loop
pandas
offers you the df.cummax
function:
import pandas as pd
pd.DataFrame(X).cummax(axis=1).values
array([[1, 2, 2, 4, 5],
[0, 3, 3, 5, 5]])
It's useful to know that there's a first class function on hand in case your data is already loaded into a dataframe.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With