Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Applying last valid index mask to dataframe to get last valid values

I have a dataframe that looks like the following:

    s1        s2       s3       s4
0   v1        v2       v3       v4
0   v5        v6       v7       np.nan
0   v8      np.nan     v9       np.nan
0   v10     np.nan     np.nan   np.nan

Essentially from top down there are numerical values and across columns at some random index values will switch to np.nan only.

I've used .apply(pd.Series.last_valid_index) to get the indexes for which the values are still numerical, however, I'm not sure of the most efficient way to retrieve a series for which I have the actual value at the last valid index.

Ideally I'd be able to derive a series that looks like:

   value
s1 v10
s2 v6
s3 v9
s4 v4

or as a dataframe that looks like

   s1 s2 s3 s4
0 v10 v6 v9 v4

Many thanks!

like image 242
wingsoficarus116 Avatar asked Jun 14 '18 15:06

wingsoficarus116


People also ask

How do you get the last index of a DataFrame?

iloc – Pandas Dataframe. iloc is used to retrieve data by specifying its index. In python negative index starts from the end so we can access the last element of the dataframe by specifying its index to -1.

How do you find the last value of an index?

lastIndexOf() The lastIndexOf() method returns the last index at which a given element can be found in the array, or -1 if it is not present.

How do I get the last few rows of a data frame?

Method 1: Using tail() method DataFrame. tail(n) to get the last n rows of the DataFrame. It takes one optional argument n (number of rows you want to get from the end). By default n = 5, it return the last 5 rows if the value of n is not passed to the method.


2 Answers

This is one way using NumPy indexing:

# ensure index is normalised
df = df.reset_index(drop=True)

# calculate last valid index across dataframe
idx = df.apply(pd.Series.last_valid_index)

# create result using NumPy indexing
res = pd.Series(df.values[idx, np.arange(df.shape[1])],
                index=df.columns,
                name='value')

print(res)

s1    v10
s2     v6
s3     v9
s4     v4
Name: value, dtype: object
like image 75
jpp Avatar answered Oct 05 '22 14:10

jpp


Here is another way to do it, without resetting the index:

df.apply(lambda x: x[x.notnull()].values[-1])

s1    v10
s2     v6
s3     v9
s4     v4
like image 29
sacuL Avatar answered Oct 05 '22 16:10

sacuL