I have DataFrame in view of <code>Name</code> and <code>Date</code> with values of weight in cells : <pre class="prettyprint"><code>Name Jan17 Jun18 Dec18 Apr19 count Nick 0 1.7 3.7 0 2 Jack 0 0 2.8 3.5 2 Fox 0 1.7 0 0 1 Rex 1.0 0 3.0 4.2 3 Snack 0 0 2.8 4.4 2 Yosee 0 0 0 4.3 1 Petty 0.5 1.3 2.8 3.5 4 </code></pre> <code>Start</code> and <code>Finish</code> should be added to the dataFrame in reference to the next definition: <ol> <li><code>Start</code> first non zero value in row started from <code>Jan17</code> column to <code>Apr19</code> </li> <li><code>Finish</code> first non zero value in sequence <code>Apr19</code> till to <code>Jan17</code></li> </ol> Also, if row has only one non-zero value in row then <code>Start</code> and<code>Finish</code> are the same. To find first non zero element in row I tried <code>data[col].keys, np.argmax()</code> and it works as expected. <code>date_col_list = ['Jan17','Jun18','Dec18', 'Apr19']</code> <code>data['Start']=data[date_col_list].keys([np.argmax(data[date_col_list].values!=0, axis=1)]</code> Result is: <pre class="prettyprint"><code>Name Jan17 Jun18 Dec18 Apr19 count Start Nick 0 1.7 3.7 0 2 Jun18 Jack 0 0 2.8 3.5 2 Dec18 Fox 0 1.7 0 0 1 Jun18 Rex 1.0 0 3.0 4.2 3 Jan18 Snack 0 0 2.8 4.4 2 Dec18 Yosee 0 0 0 4.3 1 Apr19 Petty 0.5 1.3 2.8 3.5 4 Jan17 </code></pre> To detect values for <code>Finish</code> column I tried to use: <code>np.apply_along_axis</code> as: <code>def func_X(i):</code> <code>return np.argmax(np.where(i!=0))</code> <code>np.apply_along_axis(func1d = func_X, axis=1, arr=data[date_col_list].values)</code> Result is error: <code>'tuple' object has no attribute 'argmax'</code> Expected dataframe is: <pre class="prettyprint"><code>Name Jan17 Jun18 Dec18 Apr19 count Start Finish Nick 0 1.7 3.7 0 2 Jun18 Dec18 Jack 0 0 2.8 3.5 2 Dec18 Apr19 Fox 0 1.7 0 0 1 Jun18 Jun18 Rex 1.0 0 3.0 4.2 3 Jan18 Apr19 Snack 0 0 2.8 4.4 2 Dec18 Apr19 Yosee 0 0 0 4.3 1 Apr19 Apr19 Petty 0.5 1.3 2.8 3.5 4 Jan17 Apr19 </code></pre> How can I find <code>Finish</code> in reference to non-zero value in direction from the last column (<code>Apr19</code>) to the first one (<code>Jan17</code>)?

<h3><code>first_valid_index</code> and <code>last_valid_index</code></h3> <pre class="prettyprint"><code>d = df.mask(df == 0).drop(['Name', 'count'], 1) df.assign( Start=d.apply(pd.Series.first_valid_index, 1), Finish=d.apply(pd.Series.last_valid_index, 1) ) Name Jan17 Jun18 Dec18 Apr19 count Start Finish 0 Nick 0.0 1.7 3.7 0.0 2 Jun18 Dec18 1 Jack 0.0 0.0 2.8 3.5 2 Dec18 Apr19 2 Fox 0.0 1.7 0.0 0.0 1 Jun18 Jun18 3 Rex 1.0 0.0 3.0 4.2 3 Jan17 Apr19 4 Snack 0.0 0.0 2.8 4.4 2 Dec18 Apr19 5 Yosee 0.0 0.0 0.0 4.3 1 Apr19 Apr19 6 Petty 0.5 1.3 2.8 3.5 4 Jan17 Apr19 </code></pre> <hr> <h3><code>stack</code> then <code>groupby</code></h3> <pre class="prettyprint"><code>d = df.mask(df == 0).drop(['Name', 'count'], 1) def fl(s): return s.xs(s.name).index[[0, -1]] s, f = d.stack().groupby(level=0).apply(fl).str df.assign(Start=s, Finish=f) Name Jan17 Jun18 Dec18 Apr19 count Start Finish 0 Nick 0.0 1.7 3.7 0.0 2 Jun18 Dec18 1 Jack 0.0 0.0 2.8 3.5 2 Dec18 Apr19 2 Fox 0.0 1.7 0.0 0.0 1 Jun18 Jun18 3 Rex 1.0 0.0 3.0 4.2 3 Jan17 Apr19 4 Snack 0.0 0.0 2.8 4.4 2 Dec18 Apr19 5 Yosee 0.0 0.0 0.0 4.3 1 Apr19 Apr19 6 Petty 0.5 1.3 2.8 3.5 4 Jan17 Apr19 </code></pre>

Using <code>cumsum</code> on the underlying <code>array</code> <pre class="prettyprint"><code>m = df.drop(['Name', 'count'], axis=1) u = m.to_numpy().cumsum(1) start = (u!=0).argmax(1) end = u.argmax(1) df.assign(start=m.columns[start], end=m.columns[end]) </code></pre> <pre class="prettyprint"><code> Name Jan17 Jun18 Dec18 Apr19 count start end 0 Nick 0.0 1.7 3.7 0.0 2 Jun18 Dec18 1 Jack 0.0 0.0 2.8 3.5 2 Dec18 Apr19 2 Fox 0.0 1.7 0.0 0.0 1 Jun18 Jun18 3 Rex 1.0 0.0 3.0 4.2 3 Jan17 Apr19 4 Snack 0.0 0.0 2.8 4.4 2 Dec18 Apr19 5 Yosee 0.0 0.0 0.0 4.3 1 Apr19 Apr19 6 Petty 0.5 1.3 2.8 3.5 4 Jan17 Apr19 </code></pre>

Find first and last non-zero column in each row of a pandas dataframe

I have DataFrame in view of Name and Date with values of weight in cells :

Name        Jan17  Jun18  Dec18    Apr19  count 
Nick         0      1.7     3.7      0     2
Jack         0       0      2.8     3.5    2       
Fox          0      1.7      0       0     1
Rex          1.0     0      3.0     4.2    3
Snack        0       0      2.8     4.4    2
Yosee        0       0       0      4.3    1 
Petty        0.5    1.3     2.8     3.5    4

Start and Finish should be added to the dataFrame in reference to the next definition:

Start first non zero value in row started from Jan17 column to Apr19
Finish first non zero value in sequence Apr19 till to Jan17

Also, if row has only one non-zero value in row then Start andFinish are the same.

To find first non zero element in row I tried data[col].keys, np.argmax() and it works as expected.

date_col_list = ['Jan17','Jun18','Dec18', 'Apr19']

data['Start']=data[date_col_list].keys([np.argmax(data[date_col_list].values!=0, axis=1)]

Result is:

Name        Jan17  Jun18  Dec18    Apr19  count   Start 
Nick         0      1.7     3.7      0     2      Jun18        
Jack         0       0      2.8     3.5    2      Dec18           
Fox          0      1.7      0       0     1      Jun18    
Rex          1.0     0      3.0     4.2    3      Jan18    
Snack        0       0      2.8     4.4    2      Dec18    
Yosee        0       0       0      4.3    1      Apr19     
Petty        0.5    1.3     2.8     3.5    4      Jan17

To detect values for Finish column I tried to use: np.apply_along_axis as: def func_X(i): return np.argmax(np.where(i!=0))

np.apply_along_axis(func1d = func_X, axis=1, arr=data[date_col_list].values)

Result is error: 'tuple' object has no attribute 'argmax'

Expected dataframe is:

Name        Jan17  Jun18  Dec18    Apr19  count   Start  Finish
Nick         0      1.7     3.7      0     2      Jun18   Dec18     
Jack         0       0      2.8     3.5    2      Dec18   Apr19        
Fox          0      1.7      0       0     1      Jun18   Jun18 
Rex          1.0     0      3.0     4.2    3      Jan18   Apr19 
Snack        0       0      2.8     4.4    2      Dec18   Apr19 
Yosee        0       0       0      4.3    1      Apr19   Apr19  
Petty        0.5    1.3     2.8     3.5    4      Jan17   Apr19

How can I find Finish in reference to non-zero value in direction from the last column (Apr19) to the first one (Jan17)?

How do you find missing values in rows in pandas?

You can extract rows/columns containing missing values from pandas. DataFrame by using the isnull() or isna() method that checks if an element is a missing value.

How do you find the last and first row of a data frame?

You can get the first row with iloc[0] and the last row with iloc[-1] . If you want to get the value of the element, you can do with iloc[0]['column_name'] , iloc[-1]['column_name'] .

Is at and LOC same in pandas?

at is a single element and using . loc maybe a Series or a DataFrame. Returning single value is not the case always. It returns array of values if the provided index is used multiple times.

`first_valid_index` and `last_valid_index`

d = df.mask(df == 0).drop(['Name', 'count'], 1)
df.assign(
    Start=d.apply(pd.Series.first_valid_index, 1),
    Finish=d.apply(pd.Series.last_valid_index, 1)
)

    Name  Jan17  Jun18  Dec18  Apr19  count  Start Finish
0   Nick    0.0    1.7    3.7    0.0      2  Jun18  Dec18
1   Jack    0.0    0.0    2.8    3.5      2  Dec18  Apr19
2    Fox    0.0    1.7    0.0    0.0      1  Jun18  Jun18
3    Rex    1.0    0.0    3.0    4.2      3  Jan17  Apr19
4  Snack    0.0    0.0    2.8    4.4      2  Dec18  Apr19
5  Yosee    0.0    0.0    0.0    4.3      1  Apr19  Apr19
6  Petty    0.5    1.3    2.8    3.5      4  Jan17  Apr19

`stack` then `groupby`

d = df.mask(df == 0).drop(['Name', 'count'], 1)
def fl(s): return s.xs(s.name).index[[0, -1]]
s, f = d.stack().groupby(level=0).apply(fl).str
df.assign(Start=s, Finish=f)

    Name  Jan17  Jun18  Dec18  Apr19  count  Start Finish
0   Nick    0.0    1.7    3.7    0.0      2  Jun18  Dec18
1   Jack    0.0    0.0    2.8    3.5      2  Dec18  Apr19
2    Fox    0.0    1.7    0.0    0.0      1  Jun18  Jun18
3    Rex    1.0    0.0    3.0    4.2      3  Jan17  Apr19
4  Snack    0.0    0.0    2.8    4.4      2  Dec18  Apr19
5  Yosee    0.0    0.0    0.0    4.3      1  Apr19  Apr19
6  Petty    0.5    1.3    2.8    3.5      4  Jan17  Apr19

In your case try something different with dot

s=df.loc[:,'Jan17':'Apr19'].ne(0)
s=s.dot(s.columns+',').str[:-1].str.split(',')
s.str[0],s.str[-1]
Out[899]: 
(0    Jun18
 1    Dec18
 2    Jun18
 3    Jan17
 4    Dec18
 5    Apr19
 6    Jan17
 dtype: object, 0    Dec18
 1    Apr19
 2    Jun18
 3    Apr19
 4    Apr19
 5    Apr19
 6    Apr19
 dtype: object)
 #df['Start'],df['End']=s.str[0],s.str[-1]

`idxmax`

mask = df.drop(['Name', 'count'], axis=1) > 0
df.assign(start=mask.idxmax(axis=1), end=mask.iloc[:,::-1].idxmax(axis=1))

    Name  Jan17  Jun18  Dec18  Apr19  count  start    end
0   Nick    0.0    1.7    3.7    0.0      2  Jun18  Dec18
1   Jack    0.0    0.0    2.8    3.5      2  Dec18  Apr19
2    Fox    0.0    1.7    0.0    0.0      1  Jun18  Jun18
3    Rex    1.0    0.0    3.0    4.2      3  Jan17  Apr19
4  Snack    0.0    0.0    2.8    4.4      2  Dec18  Apr19
5  Yosee    0.0    0.0    0.0    4.3      1  Apr19  Apr19
6  Petty    0.5    1.3    2.8    3.5      4  Jan17  Apr19

Drop irrelevant columns, then use idxmax first on the columns, then on the reversed columns to find the first and last valid indices respectively.

Using cumsum on the underlying array

m = df.drop(['Name', 'count'], axis=1)
u = m.to_numpy().cumsum(1)

start = (u!=0).argmax(1)
end = u.argmax(1)

df.assign(start=m.columns[start], end=m.columns[end])

    Name  Jan17  Jun18  Dec18  Apr19  count  start    end
0   Nick    0.0    1.7    3.7    0.0      2  Jun18  Dec18
1   Jack    0.0    0.0    2.8    3.5      2  Dec18  Apr19
2    Fox    0.0    1.7    0.0    0.0      1  Jun18  Jun18
3    Rex    1.0    0.0    3.0    4.2      3  Jan17  Apr19
4  Snack    0.0    0.0    2.8    4.4      2  Dec18  Apr19
5  Yosee    0.0    0.0    0.0    4.3      1  Apr19  Apr19
6  Petty    0.5    1.3    2.8    3.5      4  Jan17  Apr19

Find first and last non-zero column in each row of a pandas dataframe

Tags:

python

pandas

dataframe

argmax

Cindy

People also ask

4 Answers

`first_valid_index` and `last_valid_index`

`stack` then `groupby`

piRSquared

BENY

`idxmax`

cs95

user3483203

Recent Activity

Donate For Us

Find first and last non-zero column in each row of a pandas dataframe

Tags:

python

pandas

dataframe

argmax

Cindy

People also ask

4 Answers

first_valid_index and last_valid_index

stack then groupby

piRSquared

BENY

idxmax

cs95

user3483203

Related questions

Recent Activity

Donate For Us

`first_valid_index` and `last_valid_index`

`stack` then `groupby`

`idxmax`