Conditional replacement of multiple columns based on column values in pandas DataFrame

Tags:

I would like to simultaneously replace the values of multiple columns with corresponding values in other columns, based on the values in the first group of columns (specifically, where the one of the first columns is blank). Here's an example of what I'm trying to do:

import pandas as pd

df = pd.DataFrame({'a1':['m', 'n', 'o', 'p'],
                   'a2':['q', 'r', 's', 't'],
                   'b1':['',  '',  'a', '' ],
                   'b2':['',  '',  'b',  '']})

df

#   a1 a2 b1 b2
# 0  m  q
# 1  n  r
# 2  o  s  a  b
# 3  p  t

I'd like to replace the '' values in b1 and b2 with the corresponding values in a1 and a2, where b1 is blank:

#   a1 a2 b1 b2
# 0  m  q  m  q
# 1  n  r  n  r
# 2  o  s  a  b
# 3  p  t  p  t

Here's my thought process (I'm relatively new to pandas, so I'm probably speaking with a heavy R accent here):

missing = (df.b1 == '')

# First thought:
df[missing, ['b1', 'b2']] = df[missing, ['a1', 'a2']]
# TypeError: 'Series' objects are mutable, thus they cannot be hashed

# Fair enough  
df[tuple(missing), ('b1', 'b2')] = df[tuple(missing), ('a1', 'a2')]
# KeyError: ((True, True, False, True), ('a1', 'a2'))

# Obviously I'm going about this wrong.  Maybe I need to use indexing?
df[['b1', 'b2']].ix[missing,:]
#   b1 b2
# 0      
# 1      
# 3      

# That looks right
df[['b1', 'b2']][missing, :] = df[['a1', 'a2']].ix[missing, :]
# TypeError: 'Series' objects are mutable, thus they cannot be hashed
# Deja vu

df[['b1', 'b2']].ix[tuple(missing), :] = df[['a1', 'a2']].ix[tuple(missing), :]
# ValueError: could not convert string to float:
# Uhh...

I could do it column-by-column:

df['b1'].ix[missing] = df['a1'].ix[missing]
df['b2'].ix[missing] = df['a2'].ix[missing]

...but I suspect there's a more idiomatic way to do this. Thoughts?

Update: To clarify, I'm specifically wondering whether all columns can be updated at the same time. For instance, a hypothetical modification of Primer's answer (this doesn't work and results in NaNs, although I'm unsure why):

df.loc[missing, ['b1', 'b2']] = f.loc[missing, ['a1', 'a2']]

#   a1 a2   b1   b2
# 0  m  q  NaN  NaN
# 1  n  r  NaN  NaN
# 2  o  s    a    b
# 3  p  t  NaN  NaN

645

asked Mar 12 '15 19:03

danpelota

2 Answers

How about

df[['b1', 'b2']] = df[['b1', 'b2']].where(df[['b1', 'b2']] != '', df[['a1', 'a2']].values)

this returns

  a1 a2 b1 b2
0  m  q  m  q
1  n  r  n  r
2  o  s  a  b
3  p  t  p  t

176

answered Oct 13 '22 20:10

Alex

You could do it this way:

mask1 = df.b1.str.len() == 0
mask2 = df.b2.str.len() == 0
df.loc[mask1, 'b1'] = df.loc[mask1, 'a1']
df.loc[mask2, 'b2'] = df.loc[mask2, 'a2']
print df

  a1 a2 b1 b2
0  m  q  m  q
1  n  r  n  r
2  o  s  a  b
3  p  t  p  t

Or having masks like this will also work:

mask1 = df.b1 == ''
mask2 = df.b2 == ''

answered Oct 13 '22 22:10

Primer

Related questions
                            
                                Installing matplotlib-venn
                            
                                Why is "from ... import *" in a function not allowed?
                            
                                tweepy error response status code 400
                            
                                Can't play HTML5 video using Flask
                            
                                Customize Error Message When Permission Check Fails
                            
                                Download part of the youtube video using python
                            
                                Extract Dates and events associated with the date from Text corpus
                            
                                Problems with upgrading pip in Homebrew Python 2.7 installation
                            
                                Python Selenium find element by link text contains a string with wildcard or regex
                            
                                Numpy.cumsum in reverse
                            
                                Hive transform using Python: Unable to initialize custom script
                            
                                Implementing Chain of responsibility pattern in python using coroutines
                            
                                How to read constituency based parse tree
                            
                                What's the best way of distinguishing bools from numbers in Python?
                            
                                difference between readlines() and split() [python]
                            
                                python: How to calculate the cosine similarity of two word lists?
                            
                                How to change the text of a span that acts like a button
                            
                                Numpy reshape on view
                            
                                What could cause numpy.nanstd() to return nan?
                            
                                How to use nosetests in python while also passing/accepting arguments for argparse?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Conditional replacement of multiple columns based on column values in pandas DataFrame

Tags:

python

pandas

danpelota

People also ask

2 Answers

Alex

Primer

Recent Activity

Donate For Us