Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to iterate through rows of a dataframe and check whether value in a column row is NaN

I have a beginner question. I have a dataframe I am iterating over and I want to check if a value in a column2 row is NaN or not, to perform an action on this value if it is not NaN. My DataFrame looks like this:

df:

  Column1  Column2
0    a        hey
1    b        NaN
2    c        up

What I am trying right now is:

for item, frame in df['Column2'].iteritems():
    if frame.notnull() == True:
        print 'frame'

The thought behind that is that I iterate over the rows in column 2 and print frame for every row that has a value (which is a string). What I get however is this:

AttributeError                            Traceback (most recent call last)
<ipython-input-80-8b871a452417> in <module>()
      1 for item, frame in df['Column2'].iteritems():
----> 2     if frame.notnull() == True:
      3         print 'frame'

AttributeError: 'float' object has no attribute 'notnull'

When I only run the first line of my code, I get

0
hey
1
nan
2
up

which suggests that the floats in the output of the first line are the cause of the error. Can anybody tell me how I can accomplish what I want?

like image 327
sequence_hard Avatar asked Oct 14 '15 11:10

sequence_hard


People also ask

How do you check if a particular value in a DataFrame is NaN?

To check if value at a specific location in Pandas is NaN or not, call numpy. isnan() function with the value passed as argument. If value equals numpy. nan, the expression returns True, else it returns False.

How do you check if a value is in a DataFrame column?

You can check if a column contains/exists a particular value (string/int), list of multiple values in pandas DataFrame by using pd. series() , in operator, pandas. series. isin() , str.

How do you know if a value is NaN?

The math. isnan() method checks whether a value is NaN (Not a Number), or not. This method returns True if the specified value is a NaN, otherwise it returns False.


3 Answers

As you already understand , frame in

for item, frame in df['Column2'].iteritems():

is every row in the Column, its type would be the type of elements in the column (which most probably would not be Series or DataFrame). Hence, frame.notnull() on that would not work.

You should instead try -

for item, frame in df['Column2'].iteritems():
    if pd.notnull(frame):
        print frame
like image 106
Anand S Kumar Avatar answered Oct 18 '22 07:10

Anand S Kumar


try this:

df[df['Column2'].notnull()]

The above code will give you the data for which Column2 has not null value

like image 39
Hackaholic Avatar answered Oct 18 '22 07:10

Hackaholic


Using iteritems on a Series (which is what you get when you take a column from a DataFrame) iterates over pairs (index, value). So your item will take the values 0, 1, and 2 in the three iterations of the loop, and your frame will take the values 'hey', NaN, and 'up' (so "frame" is probably a bad name for it). The error comes from trying to use the method notnull on NaN (which is represented as a floating-point number).

You can use the function pd.notnull instead:

In [3]: pd.notnull(np.nan)
Out[3]: False

In [4]: pd.notnull('hey')
Out[4]: True

Another way would be to use notnull on the whole Series, and then iterate over those values (which are now boolean):

for _, value in df['Column2'].notnull().iteritems():
    if value:
        print 'frame'
like image 1
Evan Wright Avatar answered Oct 18 '22 06:10

Evan Wright