I have a pandas dataframe that represents elevation differences between points every 10 degrees for several target Turbines. I have selected the elevation differences that follow a criteria and I have added a column that represents if they are consecutive or not (metDegDiff = 10 represents consecutive points).
How can I select the maximum value of elevDif by targTurb in 3 or more consecutive 10 degree points?
ridgeDF2 = pd.DataFrame(data = {
'MetID':['A06_40','A06_50','A06_60','A06_70','A06_80','A06_100','A06_110','A06_140','A07_110','A07_130','A07_140','A08_100','A08_110','A08_120','A08_130','A08_220'],
'targTurb':['A06','A06','A06','A06','A06','A06','A06','A06','A07','A07','A07','A08','A08','A08','A08','A08'],
'metDeg':[30,50,60,70,80,100,110,140,110,130,140,100,110,120,130,220],
'elevDif':[1.433234, 1.602997,3.227997,2.002991,2.414001,2.96402,1.513,1.793976,1.612,2.429993,1.639008,1.500977,3.048004,2.174011,1.813995,1.527008],
'metDegDiff':[20,10,10,10,10,20,10,30,-30,20,10,-40,10,10,10,30]})
[Dbg]>>> ridgeDF2
MetID targTurb metDeg elevDif metDegDiff
0 A06_40 A06 30 1.433234 20
1 A06_50 A06 50 1.602997 10
2 A06_60 A06 60 3.227997 10
3 A06_70 A06 70 2.002991 10
4 A06_80 A06 80 2.414001 10
5 A06_100 A06 100 2.964020 20
6 A06_110 A06 110 1.513000 10
7 A06_140 A06 140 1.793976 30
8 A07_110 A07 110 1.612000 -30
9 A07_130 A07 130 2.429993 20
10 A07_140 A07 140 1.639008 10
11 A08_100 A08 100 1.500977 -40
12 A08_110 A08 110 3.048004 10
13 A08_120 A08 120 2.174011 10
14 A08_130 A08 130 1.813995 10
15 A08_220 A08 220 1.527008 30
In the example, for A06 there are 4 rows that have consecutive 10 metDeg values (rows 1, 2, 3, and 4) and for A8 there are 3 rows (rows 12, 13 and 14). Note that those 2 series have a length of 3 or more.
So, the output would be the maximum elevDif inside those two selected series. Like this:
MetID targTurb metDeg elevDif metDegDiff
A06_60 A06 60 3.227997 10
A08_110 A08 110 3.048004 10
In the pandas series constructor, there is a method called argmax() which is used to get the position of maximum value over the series data. The pandas series is a single-dimensional data structure object with row index values. By using row index values we can access the data.
Get max value from a row of a Dataframe in Python For the maximum value of each row, call the max() method on the Dataframe object with an argument axis=1. In the output, we can see that it returned a series of maximum values where the index is the row name and values are the maxima from each row.
The code below should work. You can run each line separately to see what is happening.
ridgeDF2['t/f'] = ridgeDF2['metDegDiff'] != 10
ridgeDF2['t/f'] = ridgeDF2['t/f'].shift().fillna(0).cumsum()
ridgeDF2['count'] = ridgeDF2.groupby('t/f')['t/f'].transform(len)-1
ridgeDF2['count'] = np.where(ridgeDF2['count'] >= 3,True,False)
ridgeDF2.loc[ridgeDF2['metDegDiff'] != 10,'count'] = False
highest = ridgeDF2.loc[ridgeDF2['count'] == True]
highest = highest.loc[highest.groupby(['targTurb','metDegDiff','t/f'])['elevDif'].idxmax()]
highest.drop(columns = ['t/f','count'])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With