Pandas update multiple columns at once

Tags:

I'm trying to update a couple fields at once - I have two data sources and I'm trying to reconcile them. I know I could do some ugly merging and then delete columns, but was expecting this code below to work:

df = pd.DataFrame([['A','B','C',np.nan,np.nan,np.nan],                   ['D','E','F',np.nan,np.nan,np.nan],[np.nan,np.nan,np.nan,'a','b','d'],                   [np.nan,np.nan,np.nan,'d','e','f']], columns = ['Col1','Col2','Col3','col1_v2','col2_v2','col3_v2'])  print df   Col1 Col2 Col3 col1_v2 col2_v2 col3_v2 0    A    B    C     NaN     NaN     NaN 1    D    E    F     NaN     NaN     NaN 2  NaN  NaN  NaN       a       b       d 3  NaN  NaN  NaN       d       e       f  #update  df.loc[df['Col1'].isnull(),['Col1','Col2', 'Col3']] = df[['col1_v2','col2_v2','col3_v2']]  print df   Col1 Col2 Col3 col1_v2 col2_v2 col3_v2 0    A    B    C     NaN     NaN     NaN 1    D    E    F     NaN     NaN     NaN 2  NaN  NaN  NaN       a       b       d 3  NaN  NaN  NaN       d       e       f

My desired output would be:

 Col1 Col2 Col3 col1_v2 col2_v2 col3_v2 0    A    B    C     NaN     NaN     NaN 1    D    E    F     NaN     NaN     NaN 2    a    b    c       a       b       d 3    d    e    f       d       e       f

I'm betting it has to do with updating/setting on a slice, but I always use .loc to update values, just not on multiple columns at once.

I feel like there's an easy way to do this that I'm just missing, any thoughts/suggestions would be welcome!

Edit to reflect solution below Thanks for the comment on the indexes. However, I have a question about this as it relates to series. If I wanted to update an individual series in a similar manner, I could do something like this:

df.loc[df['Col1'].isnull(),['Col1']] = df['col1_v2']  print df    Col1 Col2 Col3 col1_v2 col2_v2 col3_v2 0    A    B    C     NaN     NaN     NaN 1    D    E    F     NaN     NaN     NaN 2    a  NaN  NaN       a       b       d 3    d  NaN  NaN       d       e       f

Note that I didn't account for the indexes here, I filtered to a 2x1 series and set that equal to a 4x1 series, yet it handled it correctly. Thoughts? I'm trying to understand the functionality a bit better of something I've used for a while, but I guess don't have a full grasp of the underlying mechanism/rule

521

asked May 23 '16 20:05

flyingmeatball

1 Answers

you want to replace

print df.loc[df['Col1'].isnull(),['Col1','Col2', 'Col3']]    Col1 Col2 Col3 2  NaN  NaN  NaN 3  NaN  NaN  NaN

With:

replace_with_this = df.loc[df['Col1'].isnull(),['col1_v2','col2_v2', 'col3_v2']] print replace_with_this    col1_v2 col2_v2 col3_v2 2       a       b       d 3       d       e       f

Seems reasonable. However, when you do the assignment, you need to account for index alignment, which includes columns.

So, this should work:

df.loc[df['Col1'].isnull(),['Col1','Col2', 'Col3']] = replace_with_this.values  print df    Col1 Col2 Col3 col1_v2 col2_v2 col3_v2 0    A    B    C     NaN     NaN     NaN 1    D    E    F     NaN     NaN     NaN 2    a    b    d       a       b       d 3    d    e    f       d       e       f

I accounted for columns by using .values at the end. This stripped the column information from the replace_with_this dataframe and just used the values in the appropriate positions.

196

answered Sep 22 '22 01:09

piRSquared

Related questions
                            
                                Python: Numpy standard deviation error
                            
                                Installing pip using easy_install
                            
                                Eval scope in Python 2 vs. 3
                            
                                Why is `np.sum(range(N))` very slow?
                            
                                What is the best way to implement a forced page refresh using Flask?
                            
                                Send a file through sockets in Python
                            
                                Safe way to parse user-supplied mathematical formula in Python
                            
                                python accessing super class variable in child class [closed]
                            
                                np.full(size, 0) vs. np.zeros(size) vs. np.empty()
                            
                                Correct way of "Absolute Import" in Python 2.7
                            
                                Pytest use same fixture twice in one function
                            
                                What is the preferred way to preallocate NumPy arrays?
                            
                                Python underscore as a function parameter
                            
                                Getting every child widget of a Tkinter window
                            
                                How is Elastic Net used?
                            
                                How to add virtualenv to path
                            
                                python math domain errors in math.log function
                            
                                Automated docstring and comments spell check
                            
                                AttributeError: 'Namespace' object has no attribute
                            
                                Identifier normalization: Why is the micro sign converted into the Greek letter mu?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas update multiple columns at once

Tags:

python

pandas

dataframe

flyingmeatball

People also ask

1 Answers

piRSquared

Recent Activity

Donate For Us