I've a <code>pandas.Series</code> where the dtype for each row is a list object. E.g. <pre class="prettyprint"><code>>>> import numpy as np >>> import pandas as pd >>> x = pd.Series([[1,2,3], [2,np.nan], [3,4,5,np.nan], [np.nan]]) >>> x 0 [1, 2, 3] 1 [2, nan] 2 [3, 4, 5, nan] 3 [nan] dtype: object </code></pre> How do I remove the <code>nan</code> in the lists for each row? The desired output would be: <pre class="prettyprint"><code>>>> x 0 [1, 2, 3] 1 [2] 2 [3, 4, 5] 3 [] dtype: object </code></pre> This works: <pre class="prettyprint"><code>>>> x.apply(lambda y: pd.Series(y).dropna().values.tolist()) 0 [1, 2, 3] 1 [2.0] 2 [3.0, 4.0, 5.0] 3 [] dtype: object </code></pre> Is there a simpler method than using lambda, converting to the list to a Series, dropping the <code>NaN</code> and then extracting the values back into a list again?

You can use <code>list comprehension</code> with <code>pandas.notnull</code> for remove <code>NaN</code> values: <pre class="prettyprint"><code>print (x.apply(lambda y: [a for a in y if pd.notnull(a)])) 0 [1, 2, 3] 1 [2] 2 [3, 4, 5] 3 [] dtype: object </code></pre> Another solution with <code>filter</code> with condition where <code>v!=v</code> only for <code>NaN</code>: <pre class="prettyprint"><code>print (x.apply(lambda a: list(filter(lambda v: v==v, a)))) 0 [1, 2, 3] 1 [2] 2 [3, 4, 5] 3 [] dtype: object </code></pre> Thank you <code>DYZ</code> for another solution: <pre class="prettyprint"><code>print (x.apply(lambda y: list(filter(np.isfinite, y)))) 0 [1, 2, 3] 1 [2] 2 [3, 4, 5] 3 [] dtype: object </code></pre>

A simple <code>numpy</code> solution with list comprehension: <pre class="prettyprint"><code>pd.Series([np.array(e)[~np.isnan(e)] for e in x.values]) </code></pre>

How to remove NaN from a Pandas Series where the dtype is a list?

Tags:

python

list

pandas

nan

numpy

I've a pandas.Series where the dtype for each row is a list object. E.g.

>>> import numpy as np
>>> import pandas as pd
>>> x = pd.Series([[1,2,3], [2,np.nan], [3,4,5,np.nan], [np.nan]])
>>> x
0         [1, 2, 3]
1          [2, nan]
2    [3, 4, 5, nan]
3             [nan]
dtype: object

How do I remove the nan in the lists for each row?

The desired output would be:

>>> x
0         [1, 2, 3]
1               [2]
2         [3, 4, 5]
3                []
dtype: object

This works:

>>> x.apply(lambda y: pd.Series(y).dropna().values.tolist())
0          [1, 2, 3]
1              [2.0]
2    [3.0, 4.0, 5.0]
3                 []
dtype: object

Is there a simpler method than using lambda, converting to the list to a Series, dropping the NaN and then extracting the values back into a list again?

915

asked Jan 04 '17 06:01

alvas

2 Answers

You can use list comprehension with pandas.notnull for remove NaN values:

print (x.apply(lambda y: [a  for a in y if pd.notnull(a)]))
0    [1, 2, 3]
1          [2]
2    [3, 4, 5]
3           []
dtype: object

Another solution with filter with condition where v!=v only for NaN:

print (x.apply(lambda a: list(filter(lambda v: v==v, a))))
0    [1, 2, 3]
1          [2]
2    [3, 4, 5]
3           []
dtype: object

Thank you DYZ for another solution:

print (x.apply(lambda y: list(filter(np.isfinite, y))))
0    [1, 2, 3]
1          [2]
2    [3, 4, 5]
3           []
dtype: object

answered Sep 22 '22 01:09

jezrael

A simple numpy solution with list comprehension:

pd.Series([np.array(e)[~np.isnan(e)] for e in x.values])

answered Sep 19 '22 01:09

CentAu

Related questions
                            
                                Calculating a 3D gradient with unevenly spaced points
                            
                                Reading changing file in Python 3 and Python 2
                            
                                How to protect ZeroMQ Request Reply pattern against potential drops of messages?
                            
                                pandas how to use groupby to group columns by date in the label?
                            
                                How to convert a nested dictionary to pandas dataframe?
                            
                                Rename unnamed multiindex columns in Pandas DataFrame
                            
                                How can I extract Last Modified Date in MS Azure for a blob in my blob storage
                            
                                OpenID connect based authentication in Angular.js with (drf oidc) Django rest framework backend
                            
                                L2 normalised output with keras
                            
                                Python evdev equivalent for OSX
                            
                                Numpy: Find column index for element on each row
                            
                                boto3 searching unused security groups
                            
                                Using Numpy to generate random combinations of two arrays without repetition
                            
                                Concat list of pandas data frame, but ignoring column name
                            
                                How to select a subset of tests in pytest using custom markers on params
                            
                                Using Command Line Parameters with pytest --pyargs
                            
                                How to use Asynchronous Comprehensions?
                            
                                How to make a repetitive rotating animation in Kivy?
                            
                                Docker - Elasticsearch - Failed to establish a new connection: [Errno 111] Connection refused',))
                            
                                error inserting values to db with psycopg2 module

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With