python pandas: filter out records with null or empty string for a given field

Tags:

I am trying to filter out records whose field_A is null or empty string in the data frame like below:

my_df[my_df.editions is not None]
my_df.shape

This gives me error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-40-e1969e0af259> in <module>()
      1 my_df['editions'] = my['editions'].astype(str)
----> 2 my_df = my_df[my_df.editions is not None]
      3 my_df.shape

/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/core/frame.pyc in __getitem__(self, key)
   1995             return self._getitem_multilevel(key)
   1996         else:
-> 1997             return self._getitem_column(key)
   1998 
   1999     def _getitem_column(self, key):

/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/core/frame.pyc in _getitem_column(self, key)
   2002         # get column
   2003         if self.columns.is_unique:
-> 2004             return self._get_item_cache(key)
   2005 
   2006         # duplicate columns & possible reduce dimensionality

/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/core/generic.pyc in _get_item_cache(self, item)
   1348         res = cache.get(item)
   1349         if res is None:
-> 1350             values = self._data.get(item)
   1351             res = self._box_item_values(item, values)
   1352             cache[item] = res

/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/core/internals.pyc in get(self, item, fastpath)
   3288 
   3289             if not isnull(item):
-> 3290                 loc = self.items.get_loc(item)
   3291             else:
   3292                 indexer = np.arange(len(self.items))[isnull(self.items)]

/home/edamame/anaconda2/lib/python2.7/site-packages/pandas/indexes/base.pyc in get_loc(self, key, method, tolerance)
   1945                 return self._engine.get_loc(key)
   1946             except KeyError:
-> 1947                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   1948 
   1949         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4154)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4018)()

pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12368)()

pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12322)()

KeyError: True

my_df[my_df.editions != None]
my_df.shape

This one gave no error but didn't filter out any None values.

I also tried:

my_df = my_df[my_df.editions.notnull()]

This one doesn't give error but doesn't filter out any None values either.

Could anyone please advise how to solve this problem? Thanks!

280

asked Sep 13 '16 17:09

Edamame

2 Answers

You can negativize a condition while filtering using ~.

So in your case you should do:

my_df = my_df[~my_df.editions.isnull()]

167

answered Sep 19 '22 14:09

Gonzalo Ferreiro Volpi

You can filter out empty strings in your dataframe like this:

df = df[df['str_field'].str.len() > 0]

answered Sep 19 '22 14:09

StackG

Related questions
                            
                                Comparing Two Dictionaries Key Values and Returning the Value If Match
                            
                                python import module from parent package
                            
                                Allowing Ctrl-C to interrupt a python C-extension
                            
                                Python multiprocessing memory usage
                            
                                Append several variables to a list in Python
                            
                                Understanding change-making algorithm
                            
                                Pulling data to the template from an external database with django
                            
                                XML (.xsd) feed validation against a schema
                            
                                Why is "import" implemented this way?
                            
                                what is the proper way to do logging in csv file?
                            
                                Resolving AmbiguousTimeError from Django's make_aware
                            
                                How to get WhoIs info by IP in Python 3?
                            
                                kafka-server-stop.sh not working when Kafka started from Python script
                            
                                How do I can format exception stacktraces in Python logging?
                            
                                Casting a new derived column in a DataFrame from boolean to integer
                            
                                Understand the Find() function in Beautiful Soup
                            
                                How to calculate day's difference between successive pandas dataframe rows with condition
                            
                                How to get position of key in a dictionary in python
                            
                                TypeError: __init__() should return None, not 'int'
                            
                                How to set the logging level for the elasticsearch library differently to my own logging?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

python pandas: filter out records with null or empty string for a given field

Tags:

python

pandas

dataframe

Edamame

People also ask

2 Answers

Gonzalo Ferreiro Volpi

StackG

Recent Activity

Donate For Us