Finding last possible index value to satisfy filtering requirements

Q: How do you pick up the last value in a column?

If we wish to get only the first column number, we can use the MIN function to extract just the first column number, which will be the lowest number in the array. Once we get the first column, we can just add the total columns in the range and subtract 1, to get the last column number.

Q: How do you find the last occurrence of a string in Excel?

You can use any character you want. Just make sure it's unique and doesn't appear in the string already. FIND(“@”,SUBSTITUTE(A2,”/”,”@”,LEN(A2)-LEN(SUBSTITUTE(A2,”/”,””))),1) – This part of the formula would give you the position of the last forward slash.

Tags:

python

python-3.x

pandas

I have a dataframe as follows:

Slot      Time     Last      Next
1         9:30               9:37
2         9:35     9:32      9:40
3         9:40     9:37      9:52
4         9:45     9:41      9:47
5         9:50     9:47      10:00

What I want to do here is to create two new columns 'min' and 'max', such that 'min' outputs the last possible slot with time < last; and 'max' outputs the last possible slot with time < next.

The desired output here should be:

df['min'] = [NaN,1,2,3,4]

and

df['max'] = [2,2,5,4,5]

I tried something along the lines of

for index, row in df.iterrows():
    row['min'] = df[df['Time'] < row['Last']]['Slot']

but got an empty list. Any help is greatly appreciated. Thanks!

290

asked Nov 30 '18 08:11

Adrian Y

3 Answers

Firstly, I converted the date columns to datetime format, otherwise when you compare the strings, it only considers the first digit:

df = df_.copy()
df.loc[:, 'Time':'Next'] = df.loc[:, 'Time':'Next']
                             .apply(pd.to_datetime, errors='coerce')

For the min column you can do:

min_vals = [(df['Time'] < x)[::-1].idxmax() 
           if any(df['Time'] < x) else np.nan for x in df['Last']]
df_['min'] = df.loc[min_vals,'Slot'].values

And for the max :

max_vals = [(df['Time'] < x)[::-1].idxmax() 
           if any(df['Time'] < x) else np.nan for x in df['Next']]
df_.loc[:,'max'] = df.loc[max_vals,'Slot'].values

Which would give you:

print(df_)

    Slot  Time  Last   Next  min  max
0     1  9:30     -   9:37  NaN    2
1     2  9:35  9:32   9:40  1.0    2
2     3  9:40  9:37   9:52  2.0    5
3     4  9:45  9:41   9:47  3.0    4
4     5  9:50  9:47  10:00  4.0    5

107

answered Oct 29 '22 06:10

yatu

I tried this,

x=[]
y=[]
for index, row in df.iterrows():
    t=df[df['Time'] < row['Last']]['Slot'].values
    s=df[df['Time'] < row['Next']]['Slot'].values
    if len(t)==0:
    x.append(np.nan)
    else:
    x.append(t[-1])

    if len(s)==0:
    y.append(np.nan)
    else:
    y.append(s[-1])
df['min']=x
df['max']=y
print df

O/P:

   Slot                Time                Last                Next  min  max
0     1 2018-11-30 09:30:00                 NaT 2018-11-30 09:37:00  NaN    2
1     2 2018-11-30 09:35:00 2018-11-30 09:32:00 2018-11-30 09:40:00  1.0    2
2     3 2018-11-30 09:40:00 2018-11-30 09:37:00 2018-11-30 09:52:00  2.0    5
3     4 2018-11-30 09:45:00 2018-11-30 09:41:00 2018-11-30 09:47:00  3.0    4
4     5 2018-11-30 09:50:00 2018-11-30 09:47:00 2018-11-30 10:00:00  4.0    5

Note: It's a not a pandas way to solve this, as you attempted in loop, I suggest gave an idea to solve in for loop. It lags in performance.

answered Oct 29 '22 05:10

Mohamed Thasin ah

This is an occasion when numba can be helpful in providing an efficient solution. This is an explicit for loop, but JIT-compiled for performance.

from numba import njit

# convert to timedelta
time_cols = ['Time','Last','Next']
df[time_cols] = (df[time_cols] + ':00').apply(pd.to_timedelta)

# define loopy algorithm
@njit
def get_idx(times, comps, slots):
    n = len(times)
    res = np.empty(n)
    for i in range(n):
        mycomp = comps[i]
        if mycomp != mycomp:
            res[i] = np.nan
        else:
            for j in range(n, 0, -1):
                if times[j-1] < mycomp:
                    res[i] = slots[j-1]
                    break
            else:
                res[i] = np.nan
    return res

# extract timedeltas as seconds    
arr = df[time_cols].apply(lambda x: x.dt.total_seconds()).values

# apply logic    
df['min'] = get_idx(arr[:, 0], arr[:, 1], df['Slot'].values)
df['max'] = get_idx(arr[:, 0], arr[:, 2], df['Slot'].values)

Result

print(df)

   Slot     Time     Last     Next  min  max
0     1 09:30:00      NaT 09:37:00  NaN  2.0
1     2 09:35:00 09:32:00 09:40:00  1.0  2.0
2     3 09:40:00 09:37:00 09:52:00  2.0  5.0
3     4 09:45:00 09:41:00 09:47:00  3.0  4.0
4     5 09:50:00 09:47:00 10:00:00  4.0  5.0

Performance benchmarking

You can see massive performance improvements for larger dataframes:

def nix(df):
    min_vals = [(df['Time'] < x)[::-1].idxmax() 
           if any(df['Time'] < x) else np.nan for x in df['Last']]
    df['min'] = df.loc[min_vals,'Slot'].values

    max_vals = [(df['Time'] < x)[::-1].idxmax() 
               if any(df['Time'] < x) else np.nan for x in df['Next']]
    df.loc[:,'max'] = df.loc[max_vals,'Slot'].values
    return df

def jpp(df):
    arr = df[time_cols].apply(lambda x: x.dt.total_seconds()).values
    df['min'] = get_idx(arr[:, 0], arr[:, 1], df['Slot'].values)
    df['max'] = get_idx(arr[:, 0], arr[:, 2], df['Slot'].values)
    return df

df = pd.concat([df]*1000, ignore_index=True)

%timeit nix(df.copy())  # 8.85 s per loop
%timeit jpp(df.copy())  # 5.02 ms per loop

Related: Efficiently return the index of the first value satisfying condition in array.

answered Oct 29 '22 05:10

jpp

Related questions
                            
                                How to hide panel and navbar in login page framework7
                            
                                Laravel sometimes validation on arrays
                            
                                question for docker build --add-host command
                            
                                IIS + compression-webpack-plugin (gzip) - Loading failed for the "script" with source
                            
                                How can I create a dummy model in Keras?
                            
                                Is it possible to hide branches in Azure DevOps for specific users ?
                            
                                Validate parameter of constructor before calling superclass constructor
                            
                                Parse description string to populate NULL fields
                            
                                Module not found: Can't resolve 'react' in 'C:\Users\USER\Desktop\test\src'
                            
                                Making sentence/word plural based on value
                            
                                How do I pass a dictionary to an ansible ad-hoc command?
                            
                                dplyr mutate specific columns by evaluating lookup cell value

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With