I checked this post: finding non-numeric rows in dataframe in pandas? but it doesn't really answer my question.
my sample data:
import pandas as pd d = { 'unit': ['UD', 'UD', 'UD', 'UD', 'UD','UD'], 'N-D': [ 'Q1', 'Q2', 'Q3', 'Q4','Q5','Q6'], 'num' : [ -1.48, 1.7, -6.18, 0.25, 'sum(d)', 0.25] } df = pd.DataFrame(d)
it looks like this:
N-D num unit 0 Q1 -1.48 UD 1 Q2 1.70 UD 2 Q3 -6.18 UD 3 Q4 0.25 UD 4 Q5 sum(d) UD 5 Q6 0.25 UD
I want to filter out only the rows in column 'num' that are NON-NUMERIC. I want all of the columns for only the rows that contain non-numeric values for column 'num'.
desired output:
N-D num unit 4 Q5 sum(d) UD
my attempts:
nonnumeric=df[~df.applymap(np.isreal).all(1)] #didn't work, it pulled out everything, besides i want the condition to check only column 'num'. nonnumeric=df['num'][~df.applymap(np.isreal).all(1)] #didn't work, it pulled out all the rows for column 'num' only.
Python String isnumeric() Method The str. isnumeric() checks whether all the characters of the string are numeric characters or not. It will return True if all characters are numeric and will return False even if one character is non-numeric.
In Pandas missing data is represented by two value: None: None is a Python singleton object that is often used for missing data in Python code. NaN : NaN (an acronym for Not a Number), is a special floating-point value recognized by all systems that use the standard IEEE floating-point representation.
You can get the number of rows in Pandas DataFrame using len(df. index) and df. shape[0] properties. Pandas allow us to get the shape of the DataFrame by counting the number of rows in the DataFrame.
Use boolean indexing
with mask created by to_numeric
+ isnull
Note: This solution does not find or filter numbers saved as strings: like '1' or '22'
print (pd.to_numeric(df['num'], errors='coerce')) 0 -1.48 1 1.70 2 -6.18 3 0.25 4 NaN 5 0.25 Name: num, dtype: float64 print (pd.to_numeric(df['num'], errors='coerce').isnull()) 0 False 1 False 2 False 3 False 4 True 5 False Name: num, dtype: bool print (df[pd.to_numeric(df['num'], errors='coerce').isnull()]) N-D num unit 4 Q5 sum(d) UD
Another solution with isinstance
and apply
:
print (df[df['num'].apply(lambda x: isinstance(x, str))]) N-D num unit 4 Q5 sum(d) UD
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With