Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

remove entries with nan values in python dictionary

I have the foll. dictionary in python:

OrderedDict([(30, ('A1', 55.0)), (31, ('A2', 125.0)), (32, ('A3', 180.0)), (43, ('A4', nan))])

Is there a way to remove the entries where any of the values is NaN? I tried this:

{k: dict_cg[k] for k in dict_cg.values() if not np.isnan(k)}

It would be great if the soln works for both python 2 and python 3

like image 921
user308827 Avatar asked Jun 26 '18 05:06

user308827


People also ask

How do I remove all NaN values?

By using dropna() method you can drop rows with NaN (Not a Number) and None values from pandas DataFrame. Note that by default it returns the copy of the DataFrame after removing rows. If you wanted to remove from the existing DataFrame, you should use inplace=True .

How do you remove NaN strings?

We can replace the NaN with an empty string using df. replace() function. This function will replace an empty string inplace of the NaN value.

Can we delete values in dictionary python?

The del keyword can be used to in-place delete the key that is present in the dictionary in Python.


1 Answers

Since you have pandas, you can leverage pandas' pd.Series.notnull function here, which works with mixed dtypes.

>>> import pandas as pd
>>> {k: v for k, v in dict_cg.items() if pd.Series(v).notna().all()}
{30: ('A1', 55.0), 31: ('A2', 125.0), 32: ('A3', 180.0)}

This is not part of the answer, but may help you understand how I've arrived at the solution. I came across some weird behaviour when trying to solve this question, using pd.notnull directly.

Take dict_cg[43].

>>> dict_cg[43]
('A4', nan)

pd.notnull does not work.

>>> pd.notnull(dict_cg[43])
True

It treats the tuple as a single value (rather than an iterable of values). Furthermore, converting this to a list and then testing also gives an incorrect answer.

>>> pd.notnull(list(dict_cg[43]))
array([ True,  True])

Since the second value is nan, the result I'm looking for should be [True, False]. It finally works when you pre-convert to a Series:

>>> pd.Series(dict_cg[43]).notnull() 
0     True
1    False
dtype: bool

So, the solution is to Series-ify it and then test the values.

Along similar lines, another (admittedly roundabout) solution is to pre-convert to an object dtype numpy array, and pd.notnull will work directly:

>>> pd.notnull(np.array(dict_cg[43], dtype=object))
Out[151]: array([True,  False])

I imagine that pd.notnull directly converts dict_cg[43] to a string array under the covers, rendering the NaN as a string "nan", so it is no longer a "null" value.

like image 130
cs95 Avatar answered Sep 17 '22 22:09

cs95