Pandas force NaN to bottom of each column at each index

Tags:

pandas

I have a DataFrame where multiple rows span each index. Taking the first index, for example, has such a structure:

df = pd.DataFrame([["A", "first", 1.0, 1.0, np.NaN],
            [np.NaN, np.NaN, 2.0, np.NaN, 2.0],
            [np.NaN, np.NaN, np.NaN, 3.0, 3.0]],
            columns=["ID", "Name", "val1", "val2", "val3"],
            index=[0, 0, 0])

Out[4]:
    ID   Name  val1  val2  val3
0    A  first     1     1   NaN
0  NaN    NaN     2   NaN     2
0  NaN    NaN   NaN     3     3

I would like to sort/order each column such that the NaNs are at the bottom of each column at that given index - a result which looks like this:

    ID   Name  val1  val2  val3
0    A  first     1     1     2
0  NaN    NaN     2     3     3
0  NaN    NaN   NaN     NaN   NaN

A more explicit example might look like this:

df = pd.DataFrame([["A", "first", 1.0, 1.0, np.NaN],
            [np.NaN, np.NaN, 2.0, np.NaN, 2.0],
            [np.NaN, np.NaN, np.NaN, 3.0, 3.0],
            ["B", "second", 4.0, 4.0, np.NaN],
            [np.NaN, np.NaN, 5.0, np.NaN, 5.0],
            [np.NaN, np.NaN, np.NaN, 6.0, 6.0]],
            columns=[ "ID", "Name", "val1", "val2", "val3"],
            index=[0, 0, 0, 1, 1, 1])

Out[5]:
    ID    Name  val1  val2  val3
0    A   first     1     1   NaN
0  NaN     NaN     2   NaN     2
0  NaN     NaN   NaN     3     3
1    B  second     4     4   NaN
1  NaN     NaN     5   NaN     5
1  NaN     NaN   NaN     6     6

with the desired result to look like this:

    ID    Name  val1  val2  val3
0    A   first     1     1     2
0  NaN     NaN     2     3     3
0  NaN     NaN   NaN   NaN   NaN
1    B  second     4     4     5
1  NaN     NaN     5     6     6
1  NaN     NaN   NaN   NaN   NaN

I have many thousands of rows in this dataframe, with each index containing up to a few hundred rows. My desired result will be very helpful when I to_csv the dataframe.

I have attempted to use sort_values(['val1','val2','val3']) on the whole data frame, but this results in the indices becoming disordered. I have tried to iterate through each index and sort in place, but this too does not restrict the NaN to the bottom of each indices' column. I have also tried to fillna to another value, such as 0, but I have not been successful here, either.

While I am certainly using it wrong, the na_position parameter in sort_values does not produce the desired outcome, though it seems this is likely what want.

Edit:

The final df's index is not required to be in numerical order as in my second example.

By changing ignore_index to False in the single line of @Leb's third code block,

pd.concat([df[col].sort_values().reset_index(drop=True) for col in df], axis=1, ignore_index=True)

pd.concat([df[col].sort_values().reset_index(drop=True) for col in df], axis=1, ignore_index=False)

and by creating a temp df for all rows in a given index, I was able to make this work - not pretty, but it orders things how I need them. If someone (certainly) has a better way, please let me know.

new_df = df.ix[0]
new_df = pd.concat([new_df[col].sort_values().reset_index(drop=True) for col in new_df], axis=1, ignore_index=False)
max_index = df.index[-1]
for i in range(1, max_index + 1):
    tmp = df.ix[i]
    tmp = pd.concat([tmp[col].sort_values().reset_index(drop=True) for col in tmp], axis=1, ignore_index=False)
    new_df = pd.concat([new_df,tmp])


In [10]: new_df
Out[10]:
    ID    Name  val1  val2  val3
0    A   first     1     1     2
1  NaN     NaN     2     3     3
2  NaN     NaN   NaN   NaN   NaN
0    B  second     4     4     5
1  NaN     NaN     5     6     6
2  NaN     NaN   NaN   NaN   NaN

619

asked Nov 04 '15 19:11

AGS

1 Answers

I know the issue of pushing nans to an edge has been discussed on github. For your particular frame, I'd probably do it manually at the Python level, and not worry about performance much. Something like

>>> df.groupby(level=0, sort=False).transform(lambda x: sorted(x,key=pd.isnull))
    ID    Name  val1  val2  val3
0    A   first     1     1     2
0  NaN     NaN     2     3     3
0  NaN     NaN   NaN   NaN   NaN
1    B  second     4     4     5
1  NaN     NaN     5     6     6
1  NaN     NaN   NaN   NaN   NaN

should work. Note that since sorted is a stable sort, and we're using pd.isnull as the key (where False < True), we push the NaNs to the end while preserving the order of the remaining objects. Also note that here I'm grouping just on the index; we could alternatively have grouped on whatever we wanted.

145

answered Oct 17 '22 17:10

DSM

Related questions
                            
                                Why matplotlib replace a right parenthesis with "!" in latex expression?
                            
                                Passing Python3 to virtualenvwrapper throws up ImportError
                            
                                Pandas: merge two dataframes ignoring NaN
                            
                                Python: Get all combinations of sequential elements of list
                            
                                Pass !, !=, ~, <, > as parameters
                            
                                Matplotlib scatter(): default value for size, marker shape
                            
                                How to apply parallel or asynchronous I/O file writing on a python piece of code
                            
                                Py2exe and selenium - IOError: [Errno 2] No such file or directory: '\\dist\\main.exe\\selenium\\webdriver\\firefox\\webdriver_prefs.json'
                            
                                Loading C# DLL with unmanaged exports into Python
                            
                                Automatic 'focus' of TextInput Kivy
                            
                                Remove non-duplicated entries
                            
                                Django rest framework serializer return a list instead of json
                            
                                Why does Django not have a view permission?
                            
                                How can I draw diagrams from database in python
                            
                                Aggregate UDFs with Python in Redshift
                            
                                python multiline command running from bash
                            
                                Neat way of popping key, value PAIR from dictionary?
                            
                                Numpy matrix combination
                            
                                pdfkit does not load css and JavaScript
                            
                                What does the matplotlib `hold` keyword argument do?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas force NaN to bottom of each column at each index

Tags:

python

pandas

AGS

People also ask

1 Answers

DSM

Recent Activity

Donate For Us