Suppose I have data of the form
Name h1 h2 h3 h4
A 1 nan 2 3
B nan nan 1 3
C 1 3 2 nan
I want to move all non-nan cells to the left (or collect all non-nan data in new columns) while preserving the order from left to right, getting
Name h1 h2 h3 h4
A 1 2 3 nan
B 1 3 nan nan
C 1 3 2 nan
I can of course do so row by row. But I hope to know if there are other ways with better performance.
Use the Pandas dropna() method, It allows the user to analyze and drop Rows/Columns with Null values in different ways. Display updated Data Frame.
Pandas Series: shift() function The shift() function is used to shift index by desired number of periods with an optional time freq. When freq is not passed, shift the index without realigning the data.
by-default pandas consider #N/A, -NaN, -n/a, N/A, NULL etc as NaN value. let's see the example for better understanding. so this is our dataframe it has three column names, class, and total marks. now import the dataframe in python pandas.
bfill() is used to backward fill the missing values in the dataset. It will backward fill the NaN values that are present in the pandas dataframe. ffill() function is used forward fill the missing value in the dataframe.
First, create a boolean array using np.isnan
this would mark NaN
as True and non-nan values as False
then argsort them, this way you will maintain the order of non-nan values and NaN
are pushed to the right.
idx = np.isnan(df.values).argsort(axis=1)
df = pd.DataFrame(
df.values[np.arange(df.shape[0])[:, None], idx],
index=df.index,
columns=df.columns,
)
h1 h2 h3 h4
Name
A 1.0 2.0 3.0 NaN
B 1.0 3.0 NaN NaN
C 1.0 3.0 2.0 NaN
np.isnan(df.values)
# array([[False, True, False, False],
# [ True, True, False, False],
# [False, False, False, True]])
# False ⟶ 0 True ⟶ 1
# When sorted all True values i.e nan are pushed to the right.
idx = np.isnan(df.values).argsort(axis=1)
# array([[0, 2, 3, 1],
# [2, 3, 0, 1],
# [0, 1, 2, 3]], dtype=int64)
# Now, indexing `df.values` using `idx`
df.values[np.arange(df.shape[0])[:, None], idx]
# array([[ 1., 2., 3., nan],
# [ 1., 3., nan, nan],
# [ 1., 3., 2., nan]])
# Make that as a DataFrame
df = pd.DataFrame(
df.values[np.arange(df.shape[0])[:, None], idx],
index=df.index,
columns=df.columns,
)
# h1 h2 h3 h4
# Name
# A 1.0 2.0 3.0 NaN
# B 1.0 3.0 NaN NaN
# C 1.0 3.0 2.0 NaN
Here's what I did:
I unstacked your dataframe into a longer format, then grouped by the name column. Within each group, I drop the NaNs, but then reindex to the full h1 thought h4 set, thus re-creating your NaNs to the right.
from io import StringIO
import pandas
def defragment(x):
values = x.dropna().values
return pandas.Series(values, index=df.columns[:len(values)])
datastring = StringIO("""\
Name h1 h2 h3 h4
A 1 nan 2 3
B nan nan 1 3
C 1 3 2 nan""")
df = pandas.read_table(datastring, sep='\s+').set_index('Name')
long_index = pandas.MultiIndex.from_product([df.index, df.columns])
print(
df.stack()
.groupby(level='Name')
.apply(defragment)
.reindex(long_index)
.unstack()
)
And so I get:
h1 h2 h3 h4
A 1 2 3 NaN
B 1 3 NaN NaN
C 1 3 2 NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With