Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding index of a pandas DataFrame value

I am trying to process some .csv data using pandas, and I am struggling with something that I am sure is a rookie move, but after spending a lot of time trying to make this work, I need your help.

Essentially, I am trying to find the index of a value within a dataframe I have created.

max = cd_gross_revenue.max()
#max value of the cd_gross_revenue dataframe

print max
#finds max value, no problem!

maxindex = cd_gross_revenue.idxmax()
print maxindex
#finds index of max_value, what I wanted!

print max.index
#ERROR: AttributeError: 'numpy.float64' object has no attribute 'index'

The maxindex variable gets me the answer using idxmax(), but what if I am not looking for the index of a max value? What if it is some random value's index that I am looking at, how would I go about it? Clearly .index does not work for me here.

Thanks in advance for any help!

like image 725
ploo Avatar asked Oct 01 '14 20:10

ploo


3 Answers

Use a boolean mask to get the rows where the value is equal to the random variable. Then use that mask to index the dataframe or series. Then you would use the .index field of the pandas dataframe or series. An example is:

In [9]: s = pd.Series(range(10,20))

In [10]: s
Out[10]:

0    10
1    11
2    12
3    13
4    14
5    15
6    16
7    17
8    18
9    19
dtype: int64

In [11]: val_mask = s == 13

In [12]: val_mask

Out[12]:
0    False
1    False
2    False
3     True
4    False
5    False
6    False
7    False
8    False
9    False
dtype: bool

In [15]: s[val_mask]
Out[15]:
3    13
dtype: int64

In [16]: s[val_mask].index
Out[16]: Int64Index([3], dtype='int64')
like image 60
Daniel Avatar answered Nov 12 '22 06:11

Daniel


s[s==13]

Eg,

from pandas import Series

s = Series(range(10,20))
s[s==13]

3    13
dtype: int64
like image 4
Adam Hughes Avatar answered Nov 12 '22 05:11

Adam Hughes


When you called idxmax it returned the key in the index which corresponded to the max value. You need to pass that key to the dataframe to get that value.

max_key = cd_gross_revenue.idxmax()
max_value = cd_gross_revenue.loc[max_key]
like image 1
b10n Avatar answered Nov 12 '22 05:11

b10n