Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Identify the first and all non-zero values in every row in Pandas DataFrame

Tags:

python

pandas

I have a Pandas DataFrame similar to the following

data=pd.DataFrame([['Juan',0,0,400,450,500],['Luis',100,100,100,100,100],[ 'Maria',0,20,50,300,500],[ 'Laura',0,0,0,100,900],['Lina',0,0,0,0,10]])

data.columns=['Name','Date1','Date2','Date3','Date4','Date5']
    
Name  Date1  Date2  Date3  Date4  Date5
0   Juan      0      0    400    450    500
1   Luis    100    100    100    100    100
2  Maria      0     20     50    300    500
3  Laura      0      0      0    100    900
4   Lina      0      0      0      0     10

and i want to generate two separate data frames. The first should include a 1 at all locations of non-zero values of the previous DataFrame, i.e.

    Name  Date1  Date2  Date3  Date4  Date5
0   Juan      0      0      1      1      1
1   Luis      1      1      1      1      1
2  Maria      0      1      1      1      1
3  Laura      0      0      0      1      1
4   Lina      0      0      0      0      1

The second should have a 1 in the first non-zero value of each row.

    Name  Date1  Date2  Date3  Date4  Date5
0   Juan      0      0      1      0      0
1   Luis      1      0      0      0      0
2  Maria      0      1      0      0      0
3  Laura      0      0      0      1      0
4   Lina      0      0      0      0      1

I checked other posts and found that i can get the first with the following

out=data.copy()
out.iloc[:,1:6]=data.select_dtypes(include=['number']).where(data.select_dtypes(include=['number'])==0,1)

Is there any easier/simpler way to achieve the first result that i want? and

Does anyone know how to achieve the second result? (In addition of course of a double loop that compares number by number which would be the brute force approach that i'd rather avoid)

like image 832
Juan Ossa Avatar asked Aug 15 '20 07:08

Juan Ossa


1 Answers

For first you can select only numeric columns and replace non 0 value by 1 in DataFrame.mask, then for second add cumulative sum per axis=1 with compare first 1 values by DataFrame.eq and boolean mask convert to integers by DataFrame.astype:

df1, df2 = data.copy(), data.copy()
cols = df1.select_dtypes(include=np.number).columns
df1[cols] = df1[cols].mask(data[cols].ne(0), 1)

df2[cols] = df1[cols].cumsum(axis=1).eq(1).astype(int)
print(df1)
    Name  Date1  Date2  Date3  Date4  Date5
0   Juan      0      0      1      1      1
1   Luis      1      1      1      1      1
2  Maria      0      1      1      1      1
3  Laura      0      0      0      1      1
4   Lina      0      0      0      0      1

print(df2)
    Name  Date1  Date2  Date3  Date4  Date5
0   Juan      0      0      1      0      0
1   Luis      1      0      0      0      0
2  Maria      0      1      0      0      0
3  Laura      0      0      0      1      0
4   Lina      0      0      0      0      1
like image 74
jezrael Avatar answered Oct 29 '22 14:10

jezrael