Replacing blank values (white space) with NaN in pandas

Tags:

I want to find all values in a Pandas dataframe that contain whitespace (any arbitrary amount) and replace those values with NaNs.

Any ideas how this can be improved?

Basically I want to turn this:

                   A    B    C 2000-01-01 -0.532681  foo    0 2000-01-02  1.490752  bar    1 2000-01-03 -1.387326  foo    2 2000-01-04  0.814772  baz      2000-01-05 -0.222552         4 2000-01-06 -1.176781  qux

Into this:

                   A     B     C 2000-01-01 -0.532681   foo     0 2000-01-02  1.490752   bar     1 2000-01-03 -1.387326   foo     2 2000-01-04  0.814772   baz   NaN 2000-01-05 -0.222552   NaN     4 2000-01-06 -1.176781   qux   NaN

I've managed to do it with the code below, but man is it ugly. It's not Pythonic and I'm sure it's not the most efficient use of pandas either. I loop through each column and do boolean replacement against a column mask generated by applying a function that does a regex search of each value, matching on whitespace.

for i in df.columns:     df[i][df[i].apply(lambda i: True if re.search('^\s*$', str(i)) else False)]=None

It could be optimized a bit by only iterating through fields that could contain empty strings:

if df[i].dtype == np.dtype('object')

But that's not much of an improvement

And finally, this code sets the target strings to None, which works with Pandas' functions like fillna(), but it would be nice for completeness if I could actually insert a NaN directly instead of None.

896

asked Nov 18 '12 22:11

Chris Clark

1 Answers

I think df.replace() does the job, since pandas 0.13:

df = pd.DataFrame([     [-0.532681, 'foo', 0],     [1.490752, 'bar', 1],     [-1.387326, 'foo', 2],     [0.814772, 'baz', ' '],          [-0.222552, '   ', 4],     [-1.176781,  'qux', '  '],          ], columns='A B C'.split(), index=pd.date_range('2000-01-01','2000-01-06'))  # replace field that's entirely space (or empty) with NaN print(df.replace(r'^\s*$', np.nan, regex=True))

Produces:

                   A    B   C 2000-01-01 -0.532681  foo   0 2000-01-02  1.490752  bar   1 2000-01-03 -1.387326  foo   2 2000-01-04  0.814772  baz NaN 2000-01-05 -0.222552  NaN   4 2000-01-06 -1.176781  qux NaN

As Temak pointed it out, use df.replace(r'^\s+$', np.nan, regex=True) in case your valid data contains white spaces.

145

answered Sep 29 '22 08:09

patricksurry

Related questions
                            
                                Execute code when Django starts ONCE only?
                            
                                Get the current git hash in a Python script
                            
                                What is the difference between pyenv, virtualenv, anaconda?
                            
                                When and why should I use a namedtuple instead of a dictionary? [duplicate]
                            
                                Pandas column of lists, create a row for each list element
                            
                                Is it possible to use argsort in descending order?
                            
                                Queue.Queue vs. collections.deque
                            
                                Formatting floats without trailing zeros
                            
                                SQLAlchemy: print the actual query
                            
                                django test app error - Got an error creating the test database: permission denied to create database
                            
                                How do I get current URL in Selenium Webdriver 2 Python?
                            
                                Format / Suppress Scientific Notation from Python Pandas Aggregation Results
                            
                                Applying function with multiple arguments to create a new pandas column
                            
                                How to get the last N rows of a pandas DataFrame?
                            
                                What is the source code of the "this" module doing?
                            
                                Object of custom type as dictionary key
                            
                                How to get exit code when using Python subprocess communicate method?
                            
                                Understanding __getitem__ method
                            
                                Bulk package updates using Conda
                            
                                How to find which columns contain any NaN value in Pandas dataframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Replacing blank values (white space) with NaN in pandas

Tags:

python

pandas

dataframe

Chris Clark

People also ask

1 Answers

patricksurry

Recent Activity

Donate For Us