Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

get non numerical rows in a column pandas python

Tags:

I checked this post: finding non-numeric rows in dataframe in pandas? but it doesn't really answer my question.

my sample data:

import pandas as pd   d = {  'unit': ['UD', 'UD', 'UD', 'UD', 'UD','UD'],  'N-D': [ 'Q1', 'Q2', 'Q3', 'Q4','Q5','Q6'],  'num' : [ -1.48, 1.7, -6.18, 0.25, 'sum(d)', 0.25]  } df = pd.DataFrame(d) 

it looks like this:

  N-D   num   unit 0  Q1  -1.48   UD 1  Q2   1.70   UD 2  Q3  -6.18   UD 3  Q4   0.25   UD 4  Q5   sum(d) UD 5  Q6   0.25   UD 

I want to filter out only the rows in column 'num' that are NON-NUMERIC. I want all of the columns for only the rows that contain non-numeric values for column 'num'.

desired output:

  N-D   num   unit 4  Q5   sum(d) UD 

my attempts:

nonnumeric=df[~df.applymap(np.isreal).all(1)] #didn't work, it pulled out everything, besides i want the condition to check only column 'num'.   nonnumeric=df['num'][~df.applymap(np.isreal).all(1)] #didn't work, it pulled out all the rows for column 'num' only. 
like image 671
Jessica Avatar asked May 23 '17 16:05

Jessica


People also ask

How do you find non-numeric values in Python?

Python String isnumeric() Method The str. isnumeric() checks whether all the characters of the string are numeric characters or not. It will return True if all characters are numeric and will return False even if one character is non-numeric.

Is not a number pandas?

In Pandas missing data is represented by two value: None: None is a Python singleton object that is often used for missing data in Python code. NaN : NaN (an acronym for Not a Number), is a special floating-point value recognized by all systems that use the standard IEEE floating-point representation.

How do I get NUM rows in pandas?

You can get the number of rows in Pandas DataFrame using len(df. index) and df. shape[0] properties. Pandas allow us to get the shape of the DataFrame by counting the number of rows in the DataFrame.


1 Answers

Use boolean indexing with mask created by to_numeric + isnull
Note: This solution does not find or filter numbers saved as strings: like '1' or '22'

print (pd.to_numeric(df['num'], errors='coerce')) 0   -1.48 1    1.70 2   -6.18 3    0.25 4     NaN 5    0.25 Name: num, dtype: float64  print (pd.to_numeric(df['num'], errors='coerce').isnull()) 0    False 1    False 2    False 3    False 4     True 5    False Name: num, dtype: bool  print (df[pd.to_numeric(df['num'], errors='coerce').isnull()])   N-D     num unit 4  Q5  sum(d)   UD 

Another solution with isinstance and apply:

print (df[df['num'].apply(lambda x: isinstance(x, str))])   N-D     num unit 4  Q5  sum(d)   UD 
like image 130
jezrael Avatar answered Sep 26 '22 14:09

jezrael