Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

group by pandas dataframe and select maximun value within sequence

I have a pandas dataframe that represents elevation differences between points every 10 degrees for several target Turbines. I have selected the elevation differences that follow a criteria and I have added a column that represents if they are consecutive or not (metDegDiff = 10 represents consecutive points).

How can I select the maximum value of elevDif by targTurb in 3 or more consecutive 10 degree points?

ridgeDF2 = pd.DataFrame(data = {
    'MetID':['A06_40','A06_50','A06_60','A06_70','A06_80','A06_100','A06_110','A06_140','A07_110','A07_130','A07_140','A08_100','A08_110','A08_120','A08_130','A08_220'],
    'targTurb':['A06','A06','A06','A06','A06','A06','A06','A06','A07','A07','A07','A08','A08','A08','A08','A08'],
    'metDeg':[30,50,60,70,80,100,110,140,110,130,140,100,110,120,130,220],
    'elevDif':[1.433234, 1.602997,3.227997,2.002991,2.414001,2.96402,1.513,1.793976,1.612,2.429993,1.639008,1.500977,3.048004,2.174011,1.813995,1.527008],
    'metDegDiff':[20,10,10,10,10,20,10,30,-30,20,10,-40,10,10,10,30]})
[Dbg]>>> ridgeDF2
      MetID targTurb  metDeg   elevDif  metDegDiff
0    A06_40      A06      30  1.433234          20
1    A06_50      A06      50  1.602997          10
2    A06_60      A06      60  3.227997          10
3    A06_70      A06      70  2.002991          10
4    A06_80      A06      80  2.414001          10
5   A06_100      A06     100  2.964020          20
6   A06_110      A06     110  1.513000          10
7   A06_140      A06     140  1.793976          30
8   A07_110      A07     110  1.612000         -30
9   A07_130      A07     130  2.429993          20
10  A07_140      A07     140  1.639008          10
11  A08_100      A08     100  1.500977         -40
12  A08_110      A08     110  3.048004          10
13  A08_120      A08     120  2.174011          10
14  A08_130      A08     130  1.813995          10
15  A08_220      A08     220  1.527008          30

In the example, for A06 there are 4 rows that have consecutive 10 metDeg values (rows 1, 2, 3, and 4) and for A8 there are 3 rows (rows 12, 13 and 14). Note that those 2 series have a length of 3 or more.

So, the output would be the maximum elevDif inside those two selected series. Like this:

MetID  targTurb  metDeg   elevDif  metDegDiff
A06_60      A06      60  3.227997          10
A08_110     A08     110  3.048004          10
like image 803
casti Avatar asked Jul 06 '20 20:07

casti


People also ask

Where is Max value in Panda series?

In the pandas series constructor, there is a method called argmax() which is used to get the position of maximum value over the series data. The pandas series is a single-dimensional data structure object with row index values. By using row index values we can access the data.

How do you find the maximum value of each row in a Dataframe?

Get max value from a row of a Dataframe in Python For the maximum value of each row, call the max() method on the Dataframe object with an argument axis=1. In the output, we can see that it returned a series of maximum values where the index is the row name and values are the maxima from each row.


1 Answers

The code below should work. You can run each line separately to see what is happening.

ridgeDF2['t/f'] = ridgeDF2['metDegDiff'] != 10
ridgeDF2['t/f'] = ridgeDF2['t/f'].shift().fillna(0).cumsum()
ridgeDF2['count'] = ridgeDF2.groupby('t/f')['t/f'].transform(len)-1
ridgeDF2['count'] = np.where(ridgeDF2['count'] >= 3,True,False)
ridgeDF2.loc[ridgeDF2['metDegDiff'] != 10,'count'] = False
highest = ridgeDF2.loc[ridgeDF2['count'] == True]
highest = highest.loc[highest.groupby(['targTurb','metDegDiff','t/f'])['elevDif'].idxmax()]
highest.drop(columns = ['t/f','count'])
like image 94
rhug123 Avatar answered Oct 31 '22 14:10

rhug123