Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove NaN from a Pandas Series where the dtype is a list?

I've a pandas.Series where the dtype for each row is a list object. E.g.

>>> import numpy as np
>>> import pandas as pd
>>> x = pd.Series([[1,2,3], [2,np.nan], [3,4,5,np.nan], [np.nan]])
>>> x
0         [1, 2, 3]
1          [2, nan]
2    [3, 4, 5, nan]
3             [nan]
dtype: object

How do I remove the nan in the lists for each row?

The desired output would be:

>>> x
0         [1, 2, 3]
1               [2]
2         [3, 4, 5]
3                []
dtype: object

This works:

>>> x.apply(lambda y: pd.Series(y).dropna().values.tolist())
0          [1, 2, 3]
1              [2.0]
2    [3.0, 4.0, 5.0]
3                 []
dtype: object

Is there a simpler method than using lambda, converting to the list to a Series, dropping the NaN and then extracting the values back into a list again?

like image 915
alvas Avatar asked Jan 04 '17 06:01

alvas


People also ask

How do I remove NaN values from a series?

In the pandas series constructor, the method called dropna() is used to remove missing values from a series object. And it does not update the original series object with removed NaN values instead of updating the original series object, it will return another series object with updated values.

How do I get rid of NaN in pandas?

By using dropna() method you can drop rows with NaN (Not a Number) and None values from pandas DataFrame. Note that by default it returns the copy of the DataFrame after removing rows. If you wanted to remove from the existing DataFrame, you should use inplace=True .

How do you remove all NaN values from a list in Python?

To remove nan values from list in python using the math. isnan() function, we will first create an empty list named newList . After that, we will traverse each element of the list using a for loop and check if it is a nan value or not using the math. isnan() function.

How do you replace a NaN in a series?

fillna() method is used to replace missing values with a specified value. This method replaces the Nan or NA values in the entire series object. Value − it allows us to specify a particular value to replace Nan's, by default it takes None. Method − it is used to fill the missing values in the reindexed Series.


2 Answers

You can use list comprehension with pandas.notnull for remove NaN values:

print (x.apply(lambda y: [a  for a in y if pd.notnull(a)]))
0    [1, 2, 3]
1          [2]
2    [3, 4, 5]
3           []
dtype: object

Another solution with filter with condition where v!=v only for NaN:

print (x.apply(lambda a: list(filter(lambda v: v==v, a))))
0    [1, 2, 3]
1          [2]
2    [3, 4, 5]
3           []
dtype: object

Thank you DYZ for another solution:

print (x.apply(lambda y: list(filter(np.isfinite, y))))
0    [1, 2, 3]
1          [2]
2    [3, 4, 5]
3           []
dtype: object
like image 81
jezrael Avatar answered Sep 22 '22 01:09

jezrael


A simple numpy solution with list comprehension:

pd.Series([np.array(e)[~np.isnan(e)] for e in x.values])
like image 45
CentAu Avatar answered Sep 19 '22 01:09

CentAu