I have a beginner question. I have a dataframe I am iterating over and I want to check if a value in a column2 row is NaN
or not, to perform an action on this value if it is not NaN
. My DataFrame looks like this:
df:
Column1 Column2
0 a hey
1 b NaN
2 c up
What I am trying right now is:
for item, frame in df['Column2'].iteritems():
if frame.notnull() == True:
print 'frame'
The thought behind that is that I iterate over the rows in column 2 and print
frame for every row that has a value (which is a string). What I get however is this:
AttributeError Traceback (most recent call last)
<ipython-input-80-8b871a452417> in <module>()
1 for item, frame in df['Column2'].iteritems():
----> 2 if frame.notnull() == True:
3 print 'frame'
AttributeError: 'float' object has no attribute 'notnull'
When I only run the first line of my code, I get
0
hey
1
nan
2
up
which suggests that the floats in the output of the first line are the cause of the error. Can anybody tell me how I can accomplish what I want?
To check if value at a specific location in Pandas is NaN or not, call numpy. isnan() function with the value passed as argument. If value equals numpy. nan, the expression returns True, else it returns False.
You can check if a column contains/exists a particular value (string/int), list of multiple values in pandas DataFrame by using pd. series() , in operator, pandas. series. isin() , str.
The math. isnan() method checks whether a value is NaN (Not a Number), or not. This method returns True if the specified value is a NaN, otherwise it returns False.
As you already understand , frame
in
for item, frame in df['Column2'].iteritems():
is every row
in the Column, its type would be the type of elements in the column (which most probably would not be Series
or DataFrame
). Hence, frame.notnull()
on that would not work.
You should instead try -
for item, frame in df['Column2'].iteritems():
if pd.notnull(frame):
print frame
try this:
df[df['Column2'].notnull()]
The above code will give you the data for which Column2
has not null value
Using iteritems
on a Series (which is what you get when you take a column from a DataFrame) iterates over pairs (index, value). So your item
will take the values 0, 1, and 2 in the three iterations of the loop, and your frame
will take the values 'hey'
, NaN
, and 'up'
(so "frame" is probably a bad name for it). The error comes from trying to use the method notnull
on NaN
(which is represented as a floating-point number).
You can use the function pd.notnull
instead:
In [3]: pd.notnull(np.nan)
Out[3]: False
In [4]: pd.notnull('hey')
Out[4]: True
Another way would be to use notnull
on the whole Series, and then iterate over those values (which are now boolean):
for _, value in df['Column2'].notnull().iteritems():
if value:
print 'frame'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With