I have a DataFrame where multiple rows span each index. Taking the first index, for example, has such a structure:
df = pd.DataFrame([["A", "first", 1.0, 1.0, np.NaN],
[np.NaN, np.NaN, 2.0, np.NaN, 2.0],
[np.NaN, np.NaN, np.NaN, 3.0, 3.0]],
columns=["ID", "Name", "val1", "val2", "val3"],
index=[0, 0, 0])
Out[4]:
ID Name val1 val2 val3
0 A first 1 1 NaN
0 NaN NaN 2 NaN 2
0 NaN NaN NaN 3 3
I would like to sort/order each column such that the NaN
s are at the bottom of each column at that given index - a result which looks like this:
ID Name val1 val2 val3
0 A first 1 1 2
0 NaN NaN 2 3 3
0 NaN NaN NaN NaN NaN
A more explicit example might look like this:
df = pd.DataFrame([["A", "first", 1.0, 1.0, np.NaN],
[np.NaN, np.NaN, 2.0, np.NaN, 2.0],
[np.NaN, np.NaN, np.NaN, 3.0, 3.0],
["B", "second", 4.0, 4.0, np.NaN],
[np.NaN, np.NaN, 5.0, np.NaN, 5.0],
[np.NaN, np.NaN, np.NaN, 6.0, 6.0]],
columns=[ "ID", "Name", "val1", "val2", "val3"],
index=[0, 0, 0, 1, 1, 1])
Out[5]:
ID Name val1 val2 val3
0 A first 1 1 NaN
0 NaN NaN 2 NaN 2
0 NaN NaN NaN 3 3
1 B second 4 4 NaN
1 NaN NaN 5 NaN 5
1 NaN NaN NaN 6 6
with the desired result to look like this:
ID Name val1 val2 val3
0 A first 1 1 2
0 NaN NaN 2 3 3
0 NaN NaN NaN NaN NaN
1 B second 4 4 5
1 NaN NaN 5 6 6
1 NaN NaN NaN NaN NaN
I have many thousands of rows in this dataframe, with each index containing up to a few hundred rows. My desired result will be very helpful when I to_csv
the dataframe.
I have attempted to use sort_values(['val1','val2','val3'])
on the whole data frame, but this results in the indices becoming disordered. I have tried to iterate through each index and sort in place, but this too does not restrict the NaN
to the bottom of each indices' column. I have also tried to fillna
to another value, such as 0, but I have not been successful here, either.
While I am certainly using it wrong, the na_position
parameter in sort_values
does not produce the desired outcome, though it seems this is likely what want.
Edit:
The final df's index is not required to be in numerical order as in my second example.
By changing ignore_index
to False
in the single line of @Leb's third code block,
pd.concat([df[col].sort_values().reset_index(drop=True) for col in df], axis=1, ignore_index=True)
to
pd.concat([df[col].sort_values().reset_index(drop=True) for col in df], axis=1, ignore_index=False)
and by creating a temp df for all rows in a given index, I was able to make this work - not pretty, but it orders things how I need them. If someone (certainly) has a better way, please let me know.
new_df = df.ix[0]
new_df = pd.concat([new_df[col].sort_values().reset_index(drop=True) for col in new_df], axis=1, ignore_index=False)
max_index = df.index[-1]
for i in range(1, max_index + 1):
tmp = df.ix[i]
tmp = pd.concat([tmp[col].sort_values().reset_index(drop=True) for col in tmp], axis=1, ignore_index=False)
new_df = pd.concat([new_df,tmp])
In [10]: new_df
Out[10]:
ID Name val1 val2 val3
0 A first 1 1 2
1 NaN NaN 2 3 3
2 NaN NaN NaN NaN NaN
0 B second 4 4 5
1 NaN NaN 5 6 6
2 NaN NaN NaN NaN NaN
Pandas DataFrame fillna() Method The fillna() method replaces the NULL values with a specified value. The fillna() method returns a new DataFrame object unless the inplace parameter is set to True , in that case the fillna() method does the replacing in the original DataFrame instead.
std() function to calculate the standard deviation of values in a pandas DataFrame. Note that the std() function will automatically ignore any NaN values in the DataFrame when calculating the standard deviation.
I know the issue of pushing nans to an edge has been discussed on github. For your particular frame, I'd probably do it manually at the Python level, and not worry about performance much. Something like
>>> df.groupby(level=0, sort=False).transform(lambda x: sorted(x,key=pd.isnull))
ID Name val1 val2 val3
0 A first 1 1 2
0 NaN NaN 2 3 3
0 NaN NaN NaN NaN NaN
1 B second 4 4 5
1 NaN NaN 5 6 6
1 NaN NaN NaN NaN NaN
should work. Note that since sorted
is a stable sort, and we're using pd.isnull
as the key (where False < True), we push the NaNs to the end while preserving the order of the remaining objects. Also note that here I'm grouping just on the index; we could alternatively have grouped on whatever we wanted.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With