Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find the largest number in column

I'm trying to find the month (column 'Month') that has the largest number (in DepDelay column)

Data

flightID         Month  ArrTime ActualElapsedTime  DepDelay   ArrDelay
BBYYEUVY67527        1   1514.0               58.0       NA      64.0   
MUPXAQFN40227        1     37.0              120.0       13      52.0   
LQLYUIMN79169        1    916.0              166.0       NA     -25.0   
KTAMHIFO10843        1      NaN                NaN        5       NaN   
BOOXJTEY23623        1      NaN                NaN        4       NaN  
BBYYEUVY67527        2   1514.0               58.0       NA      64.0   
MUPXAQFN40227        2     37.0              120.0       NA      52.0   
LQLYUIMN79169        2    916.0              166.0       NA     -25.0   
KTAMHIFO10843        2      NaN                NaN       15       NaN   
BOOXJTEY23623        2      NaN                NaN        4       NaN  

I tried:

data = pd.read_csv('data.csv', sep='\t')

dep_delay = all_data.groupby(["Month"].DepDelay.count().max())

print(dep_delay)

Error:

AttributeError                            Traceback (most recent call last)
<ipython-input-14-2ea6213009d6> in <module>()
----> 1 dep_delay = all_data.groupby(["Month"].DepDelay.count().max())
      2 
      3 print(dep_delay)

AttributeError: 'list' object has no attribute 'DepDelay'

Good output:

Month      DepDelay
    1            22
like image 789
marin Avatar asked Dec 31 '25 03:12

marin


2 Answers

You need sum instead of count to sum values by group. Here's one way using GroupBy + sum, and then idxmax:

res = df.groupby('Month')['DepDelay'].sum().reset_index()
res = res.loc[[res['DepDelay'].idxmax()]]

print(res)

   Month  DepDelay
0      1      22.0

Alternatively, you can group and sort, then extract the first row:

res = df.groupby('Month')['DepDelay'].sum()\
        .sort_values(ascending=False).head(1)\
        .reset_index()

print(res)

   Month  DepDelay
0      1      22.0
like image 105
jpp Avatar answered Jan 01 '26 17:01

jpp


Another approach:

pd.DataFrame(df.loc[df['DepDelay'].idxmax(), ['Month', 'DepDelay']]).T
#  Month DepDelay
#8     2       15

And you can reset index to change 8 to 0:

pd.DataFrame(df.loc[df['DepDelay'].idxmax(), ['Month', 'DepDelay']]).T.reset_index(drop=True)
#  Month DepDelay
#0     2       15
like image 36
zipa Avatar answered Jan 01 '26 17:01

zipa



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!