Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Move non-empty cells to the left in pandas DataFrame

Tags:

python

pandas

Suppose I have data of the form

Name    h1    h2    h3    h4
A       1     nan   2     3
B       nan   nan   1     3
C       1     3     2     nan

I want to move all non-nan cells to the left (or collect all non-nan data in new columns) while preserving the order from left to right, getting

Name    h1    h2    h3    h4
A       1     2     3     nan
B       1     3     nan   nan
C       1     3     2     nan

I can of course do so row by row. But I hope to know if there are other ways with better performance.

like image 483
Lelouch Avatar asked Aug 18 '15 01:08

Lelouch


People also ask

How do I skip empty columns in pandas?

Use the Pandas dropna() method, It allows the user to analyze and drop Rows/Columns with Null values in different ways. Display updated Data Frame.

What does shift do in pandas?

Pandas Series: shift() function The shift() function is used to shift index by desired number of periods with an optional time freq. When freq is not passed, shift the index without realigning the data.

Does Panda read NaN na?

by-default pandas consider #N/A, -NaN, -n/a, N/A, NULL etc as NaN value. let's see the example for better understanding. so this is our dataframe it has three column names, class, and total marks. now import the dataframe in python pandas.

What is backward fill in pandas?

bfill() is used to backward fill the missing values in the dataset. It will backward fill the NaN values that are present in the pandas dataframe. ffill() function is used forward fill the missing value in the dataframe.


2 Answers

First, create a boolean array using np.isnan this would mark NaN as True and non-nan values as False then argsort them, this way you will maintain the order of non-nan values and NaN are pushed to the right.

idx = np.isnan(df.values).argsort(axis=1)
df = pd.DataFrame(
    df.values[np.arange(df.shape[0])[:, None], idx],
    index=df.index,
    columns=df.columns,
)

       h1   h2   h3  h4
Name
A     1.0  2.0  3.0 NaN
B     1.0  3.0  NaN NaN
C     1.0  3.0  2.0 NaN

Details

np.isnan(df.values)
# array([[False,  True, False, False],
#        [ True,  True, False, False],
#        [False, False, False,  True]])

# False ⟶ 0 True ⟶ 1
# When sorted all True values i.e nan are pushed to the right.

idx = np.isnan(df.values).argsort(axis=1)
# array([[0, 2, 3, 1],
#        [2, 3, 0, 1],
#        [0, 1, 2, 3]], dtype=int64)

# Now, indexing `df.values` using `idx`
df.values[np.arange(df.shape[0])[:, None], idx]
# array([[ 1.,  2.,  3., nan],
#        [ 1.,  3., nan, nan],
#        [ 1.,  3.,  2., nan]])

# Make that as a DataFrame
df = pd.DataFrame(
    df.values[np.arange(df.shape[0])[:, None], idx],
    index=df.index,
    columns=df.columns,
)

#        h1   h2   h3  h4
# Name
# A     1.0  2.0  3.0 NaN
# B     1.0  3.0  NaN NaN
# C     1.0  3.0  2.0 NaN
like image 166
Ch3steR Avatar answered Nov 04 '22 18:11

Ch3steR


Here's what I did:

I unstacked your dataframe into a longer format, then grouped by the name column. Within each group, I drop the NaNs, but then reindex to the full h1 thought h4 set, thus re-creating your NaNs to the right.

from io import StringIO
import pandas

def defragment(x):
    values = x.dropna().values
    return pandas.Series(values, index=df.columns[:len(values)])

datastring = StringIO("""\
Name    h1    h2    h3    h4
A       1     nan   2     3
B       nan   nan   1     3
C       1     3     2     nan""")

df = pandas.read_table(datastring, sep='\s+').set_index('Name')
long_index = pandas.MultiIndex.from_product([df.index, df.columns])

print(
    df.stack()
      .groupby(level='Name')
      .apply(defragment)
      .reindex(long_index)  
      .unstack()  
)

And so I get:

   h1  h2  h3  h4
A   1   2   3 NaN
B   1   3 NaN NaN
C   1   3   2 NaN
like image 23
Paul H Avatar answered Nov 04 '22 18:11

Paul H