Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Conditional replacement of multiple columns based on column values in pandas DataFrame

Tags:

python

pandas

I would like to simultaneously replace the values of multiple columns with corresponding values in other columns, based on the values in the first group of columns (specifically, where the one of the first columns is blank). Here's an example of what I'm trying to do:

import pandas as pd

df = pd.DataFrame({'a1':['m', 'n', 'o', 'p'],
                   'a2':['q', 'r', 's', 't'],
                   'b1':['',  '',  'a', '' ],
                   'b2':['',  '',  'b',  '']})

df

#   a1 a2 b1 b2
# 0  m  q
# 1  n  r
# 2  o  s  a  b
# 3  p  t

I'd like to replace the '' values in b1 and b2 with the corresponding values in a1 and a2, where b1 is blank:

#   a1 a2 b1 b2
# 0  m  q  m  q
# 1  n  r  n  r
# 2  o  s  a  b
# 3  p  t  p  t

Here's my thought process (I'm relatively new to pandas, so I'm probably speaking with a heavy R accent here):

missing = (df.b1 == '')

# First thought:
df[missing, ['b1', 'b2']] = df[missing, ['a1', 'a2']]
# TypeError: 'Series' objects are mutable, thus they cannot be hashed

# Fair enough  
df[tuple(missing), ('b1', 'b2')] = df[tuple(missing), ('a1', 'a2')]
# KeyError: ((True, True, False, True), ('a1', 'a2'))

# Obviously I'm going about this wrong.  Maybe I need to use indexing?
df[['b1', 'b2']].ix[missing,:]
#   b1 b2
# 0      
# 1      
# 3      

# That looks right
df[['b1', 'b2']][missing, :] = df[['a1', 'a2']].ix[missing, :]
# TypeError: 'Series' objects are mutable, thus they cannot be hashed
# Deja vu

df[['b1', 'b2']].ix[tuple(missing), :] = df[['a1', 'a2']].ix[tuple(missing), :]
# ValueError: could not convert string to float:
# Uhh...

I could do it column-by-column:

df['b1'].ix[missing] = df['a1'].ix[missing]
df['b2'].ix[missing] = df['a2'].ix[missing]

...but I suspect there's a more idiomatic way to do this. Thoughts?

Update: To clarify, I'm specifically wondering whether all columns can be updated at the same time. For instance, a hypothetical modification of Primer's answer (this doesn't work and results in NaNs, although I'm unsure why):

df.loc[missing, ['b1', 'b2']] = f.loc[missing, ['a1', 'a2']]

#   a1 a2   b1   b2
# 0  m  q  NaN  NaN
# 1  n  r  NaN  NaN
# 2  o  s    a    b
# 3  p  t  NaN  NaN
like image 645
danpelota Avatar asked Mar 12 '15 19:03

danpelota


People also ask

How replace values in column based on multiple conditions in pandas?

You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.

How do you create a new column in pandas DataFrame based on other columns?

Series. map() to create new DataFrame columns based on a given condition in Pandas. # For creating new column with multiple conditions conditions = [ (df['Base Column 1'] == 'A') & (df['Base Column 2'] == 'B'), (df['Base Column 3'] == 'C')] choices = ['Conditional Value 1', 'Conditional Value 2'] df['New Column'] = np.


2 Answers

How about

df[['b1', 'b2']] = df[['b1', 'b2']].where(df[['b1', 'b2']] != '', df[['a1', 'a2']].values)

this returns

  a1 a2 b1 b2
0  m  q  m  q
1  n  r  n  r
2  o  s  a  b
3  p  t  p  t
like image 176
Alex Avatar answered Oct 13 '22 20:10

Alex


You could do it this way:

mask1 = df.b1.str.len() == 0
mask2 = df.b2.str.len() == 0
df.loc[mask1, 'b1'] = df.loc[mask1, 'a1']
df.loc[mask2, 'b2'] = df.loc[mask2, 'a2']
print df

  a1 a2 b1 b2
0  m  q  m  q
1  n  r  n  r
2  o  s  a  b
3  p  t  p  t

Or having masks like this will also work:

mask1 = df.b1 == ''
mask2 = df.b2 == ''
like image 30
Primer Avatar answered Oct 13 '22 22:10

Primer