Applying last valid index mask to dataframe to get last valid values

Tags:

I have a dataframe that looks like the following:

    s1        s2       s3       s4
0   v1        v2       v3       v4
0   v5        v6       v7       np.nan
0   v8      np.nan     v9       np.nan
0   v10     np.nan     np.nan   np.nan

Essentially from top down there are numerical values and across columns at some random index values will switch to np.nan only.

I've used .apply(pd.Series.last_valid_index) to get the indexes for which the values are still numerical, however, I'm not sure of the most efficient way to retrieve a series for which I have the actual value at the last valid index.

Ideally I'd be able to derive a series that looks like:

   value
s1 v10
s2 v6
s3 v9
s4 v4

or as a dataframe that looks like

   s1 s2 s3 s4
0 v10 v6 v9 v4

Many thanks!

242

asked Jun 14 '18 15:06

wingsoficarus116

2 Answers

This is one way using NumPy indexing:

# ensure index is normalised
df = df.reset_index(drop=True)

# calculate last valid index across dataframe
idx = df.apply(pd.Series.last_valid_index)

# create result using NumPy indexing
res = pd.Series(df.values[idx, np.arange(df.shape[1])],
                index=df.columns,
                name='value')

print(res)

s1    v10
s2     v6
s3     v9
s4     v4
Name: value, dtype: object

answered Oct 05 '22 14:10

jpp

Here is another way to do it, without resetting the index:

df.apply(lambda x: x[x.notnull()].values[-1])

s1    v10
s2     v6
s3     v9
s4     v4

answered Oct 05 '22 16:10

sacuL

Related questions
                            
                                pytest python src layout
                            
                                Django celery beat task not working
                            
                                How to write unicode text to file in python 2 & 3 using same code?
                            
                                TypeError: __class__ assignment only supported for heap types or ModuleType subclasses
                            
                                Efficiently create arrays from a next n elements from an array
                            
                                How to loop in a list more times that list size in python?
                            
                                Pandas Dataframe select multiple discontinuous columns/slices
                            
                                How exactly does Python check through a list?
                            
                                Conditionally passing a named keyword argument to a function [duplicate]
                            
                                Comparing two Python 3 datetime objects returns "can't compare offset-naive and offset-aware datetimes: TypeError"
                            
                                PySpark: create dataframe from random uniform disribution
                            
                                not able to update my package on pypi.org
                            
                                Plotly: How to draw a sankey diagram from a dataframe?
                            
                                Report Keras model evaluation metrics every 10 epochs?
                            
                                Cannot import name 'BlockBlobService'
                            
                                regex for finding file paths
                            
                                urllib.request.Request timeout argument error
                            
                                How to Create a column with repeating values pandas (mismatching indexes)
                            
                                parallel/multithread differential evolution in python
                            
                                Discord.py Bot sending file to Discord Channel

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Applying last valid index mask to dataframe to get last valid values

Tags:

performance

python

pandas

numpy

wingsoficarus116

People also ask

2 Answers

jpp

sacuL

Recent Activity

Donate For Us