I am trying to filter out records whose field_A is null or empty string in the data frame like below:
my_df[my_df.editions is not None]
my_df.shape
This gives me error:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-40-e1969e0af259> in <module>()
1 my_df['editions'] = my['editions'].astype(str)
----> 2 my_df = my_df[my_df.editions is not None]
3 my_df.shape
/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/core/frame.pyc in __getitem__(self, key)
1995 return self._getitem_multilevel(key)
1996 else:
-> 1997 return self._getitem_column(key)
1998
1999 def _getitem_column(self, key):
/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/core/frame.pyc in _getitem_column(self, key)
2002 # get column
2003 if self.columns.is_unique:
-> 2004 return self._get_item_cache(key)
2005
2006 # duplicate columns & possible reduce dimensionality
/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/core/generic.pyc in _get_item_cache(self, item)
1348 res = cache.get(item)
1349 if res is None:
-> 1350 values = self._data.get(item)
1351 res = self._box_item_values(item, values)
1352 cache[item] = res
/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/core/internals.pyc in get(self, item, fastpath)
3288
3289 if not isnull(item):
-> 3290 loc = self.items.get_loc(item)
3291 else:
3292 indexer = np.arange(len(self.items))[isnull(self.items)]
/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/indexes/base.pyc in get_loc(self, key, method, tolerance)
1945 return self._engine.get_loc(key)
1946 except KeyError:
-> 1947 return self._engine.get_loc(self._maybe_cast_indexer(key))
1948
1949 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4154)()
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4018)()
pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12368)()
pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12322)()
KeyError: True
or
my_df[my_df.editions != None]
my_df.shape
This one gave no error but didn't filter out any None values.
I also tried:
my_df = my_df[my_df.editions.notnull()]
This one doesn't give error but doesn't filter out any None values either.
Could anyone please advise how to solve this problem? Thanks!
You can filter out rows with NAN value from pandas DataFrame column string, float, datetime e.t.c by using DataFrame. dropna() and DataFrame. notnull() methods. Python doesn't support Null hence any missing data is represented as None or NaN.
Checking for missing values using isnull() and notnull() In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull(). Both function help in checking whether a value is NaN or not. These function can also be used in Pandas Series in order to find null values in a series.
You can use df[df["Courses"] == 'Spark'] to filter rows by a condition in pandas DataFrame. Not that this expression returns a new DataFrame with selected rows. You can also write the above statement with a variable.
You can negativize a condition while filtering using ~
.
So in your case you should do:
my_df = my_df[~my_df.editions.isnull()]
You can filter out empty strings in your dataframe like this:
df = df[df['str_field'].str.len() > 0]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With