Pandas unique values per row, variable number of columns with data

Question

Consider the below dataframe:

import pandas as pd
from numpy import nan

data = [
    (111, nan, nan, 111),
    (112, 112, nan, 115),
    (113, nan, nan, nan),
    (nan, nan, nan, nan),
    (118, 110, 117, nan),
]

df = pd.DataFrame(data, columns=[f'num{i}' for i in range(len(data[0]))])

    num0    num1    num2    num3
0   111.0   NaN     NaN     111.0
1   112.0   112.0   NaN     115.0
2   113.0   NaN     NaN     NaN
3   NaN     NaN     NaN     NaN
4   118.0   110.0   117.0   NaN

Assuming my index is unique, I'm looking to retrieve the unique values per index row, to an output like the one below. I wish to keep the empty rows.

    num1    num2    num3
0   111.0   NaN     NaN
1   112.0   115.0   NaN
2   113.0   NaN     NaN
3   NaN     NaN     NaN
4   110.0   117.0   118.0

I have a working, albeit slow, solution, see below. The output number order is not relevant, as long all values are presented to the leftmost column and nulls to the right. I'm looking for best practices and potential ideas to speed up the code. Thank you in advance.

def arrange_row(row):
    values = list(set(row.dropna(axis=1).values[0]))
    values = [nan] if not values else values
    series = pd.Series(values, index=[f"num{i}" for i in range(1, len(values)+1)])
    return series

df.groupby(level=-1).apply(arrange_row).unstack(level=-1)
pd.version == '1.2.3'

Shubham Sharma · Accepted Answer

We can stack to reshape the dataframe, then group the reshaped frame on level=0 and aggregate using unqiue to get the unique values from each row, then you can create a new dataframe from these unique values

s = df.stack().groupby(level=0).unique()
pd.DataFrame([*s], index=s.index).reindex(df.index)

       0      1      2
0  111.0    NaN    NaN
1  112.0  115.0    NaN
2  113.0    NaN    NaN
3    NaN    NaN    NaN
4  118.0  110.0  117.0

Mayank Porwal · Answer

Use df.values with List comprehension and df.dropna:

# Create a list of rows of dataframe
In [788]: l = df.values 

# Use List Comprehension to remove dups from above list of lists
In [789]: l_without_dupes = [list(dict.fromkeys(i)) for i in l]

# Create a new dataframe from above list and drop the column with all NaN's
In [795]: res_df = pd.DataFrame(l_without_dupes).dropna(1, how='all')

In [796]: res_df
Out[796]: 
       0      1      2
0  111.0    NaN    NaN
1  112.0    NaN  115.0
2  113.0    NaN    NaN
3    NaN    NaN    NaN
4  118.0  110.0  117.0

Pandas unique values per row, variable number of columns with data

Tags:

python

pandas

dataframe

apply

Francisco

2 Answers

Shubham Sharma

Mayank Porwal

Recent Activity

Donate For Us

Pandas unique values per row, variable number of columns with data

Tags:

python

pandas

dataframe

apply

Francisco

2 Answers

Shubham Sharma

Mayank Porwal

Related questions

Recent Activity

Donate For Us