Group by and aggregate columns but create NaN if values do not match

Question

I have a data frame like the following:

test = pd.DataFrame({'ID':[4, 5, 6, 6, 6, 7, 7, 7], 'val1':['one', 'one', 'two', 'two', 'three', np.nan, 'seven', 'seven'], 'val2':['hi', 'bye', 'hola', 'hola', 'hola', 'ciao', 'ciao', 'namaste'], 'val3':[3, 3, 4, np.nan, 4, 5, 5, 6]})

test
   ID   val1     val2  val3
0   4    one       hi   3.0
1   5    one      bye   3.0
2   6    two     hola   4.0
3   6    two     hola   NaN
4   6  three     hola   4.0
5   7    NaN     ciao   5.0
6   7  seven     ciao   5.0
7   7  seven  namaste   6.0

Each ID has some measured values, with some IDs being done in triplicate.

If there is any disagreement between the replicate IDs for a specific column, then I want the new data frame to have an NaN for that value.

If there is an NaN already present for one value (consider it not measured), but the other two for that replicate sample match, then I want that agreement to be present in the final data frame. If there is disagreement between the two where values are present, then NaN.

I was thinking of using pandas groupby then aggregate for this, but I wasn't sure of how to do the logic for the aggregate function.

Essentially the output I am looking for is like:

pd.DataFrame({'ID':[4, 5, 6, 7], 'val1':['one', 'one', np.nan, 'seven'], 'val2':['hi', 'bye', 'hola',  np.nan], 'val3':[3, 3, 4, np.nan]})

   ID   val1  val2  val3
0   4    one    hi   3.0
1   5    one   bye   3.0
2   6    NaN  hola   4.0
3   7  seven   NaN   NaN

Could you suggest how to do this?

Thanks!

Jack

BENY · Accepted Answer

Using

test.groupby('ID',as_index=False).agg(lambda x : x.mode()[0] if x.nunique()==1 else np.nan)
Out[372]: 
   ID   val1  val2  val3
0   4    one    hi   3.0
1   5    one   bye   3.0
2   6    NaN  hola   4.0
3   7  seven   NaN   NaN

cs95 · Answer

This works because of how you've defined your problem.

First, get the first row of each ID. Next, figure out what IDs have valid values and mask everything else.

v = df.groupby('ID', as_index=False).first()
v[df.groupby('ID', as_index=False).nunique().eq(1)]

   ID   val1  val2  val3
0   4    one    hi   3.0
1   5    one   bye   3.0
2   6    NaN  hola   4.0
3   7  seven   NaN   NaN

Group by and aggregate columns but create NaN if values do not match

Tags:

python

python-3.x

pandas

Jack Arnestad

2 Answers

BENY

cs95

Recent Activity

Donate For Us

Group by and aggregate columns but create NaN if values do not match

Tags:

python

python-3.x

pandas

Jack Arnestad

2 Answers

BENY

cs95

Related questions

Recent Activity

Donate For Us