I have a Pandas DataFrame like this: <pre class="prettyprint"><code> col1 col2 col3 1 0.2 0.3 0.3 2 0.2 0.3 0.3 3 0 0.4 0.4 4 0 0 0.3 5 0 0 0 6 0.1 0.4 0.4 </code></pre> I want to replace the <code>col1</code> values with the values in the second column (<code>col2</code>) only if <code>col1</code> values are equal to 0, and after (for the zero values remaining), do it again but with the third column (<code>col3</code>). The Desired Result is the next one: <pre class="prettyprint"><code> col1 col2 col3 1 0.2 0.3 0.3 2 0.2 0.3 0.3 3 0.4 0.4 0.4 4 0.3 0 0.3 5 0 0 0 6 0.1 0.4 0.4 </code></pre> I did it using the <code>pd.replace</code> function, but it seems too slow.. I think must be a faster way to accomplish that. <pre class="prettyprint"><code>df.col1.replace(0,df.col2,inplace=True) df.col1.replace(0,df.col3,inplace=True) </code></pre> is there a faster way to do that?, using some other function instead of the <code>pd.replace</code> function?

I'm not sure if it's faster, but you're right that you can slice the dataframe to get your desired result. <pre class="prettyprint"><code>df.col1[df.col1 == 0] = df.col2 df.col1[df.col1 == 0] = df.col3 print(df) </code></pre> Output: <pre class="prettyprint"><code> col1 col2 col3 0 0.2 0.3 0.3 1 0.2 0.3 0.3 2 0.4 0.4 0.4 3 0.3 0.0 0.3 4 0.0 0.0 0.0 5 0.1 0.4 0.4 </code></pre> Alternatively if you want it to be more terse (though I don't know if it's faster) you can combine what you did with what I did. <pre class="prettyprint"><code>df.col1[df.col1 == 0] = df.col2.replace(0, df.col3) print(df) </code></pre> Output: <pre class="prettyprint"><code> col1 col2 col3 0 0.2 0.3 0.3 1 0.2 0.3 0.3 2 0.4 0.4 0.4 3 0.3 0.0 0.3 4 0.0 0.0 0.0 5 0.1 0.4 0.4 </code></pre>

Efficiently replace values from a column to another column Pandas DataFrame

Tags:

python

replace

pandas

dataframe

I have a Pandas DataFrame like this:

   col1 col2 col3 1   0.2  0.3  0.3 2   0.2  0.3  0.3 3     0  0.4  0.4 4     0    0  0.3 5     0    0    0 6   0.1  0.4  0.4

I want to replace the col1 values with the values in the second column (col2) only if col1 values are equal to 0, and after (for the zero values remaining), do it again but with the third column (col3). The Desired Result is the next one:

   col1 col2 col3 1   0.2  0.3  0.3 2   0.2  0.3  0.3 3   0.4  0.4  0.4 4   0.3    0  0.3 5     0    0    0 6   0.1  0.4  0.4

I did it using the pd.replace function, but it seems too slow.. I think must be a faster way to accomplish that.

df.col1.replace(0,df.col2,inplace=True) df.col1.replace(0,df.col3,inplace=True)

is there a faster way to do that?, using some other function instead of the pd.replace function?

730

asked Oct 06 '16 18:10

Pablo

2 Answers

Using np.where is faster. Using a similar pattern as you used with replace:

df['col1'] = np.where(df['col1'] == 0, df['col2'], df['col1']) df['col1'] = np.where(df['col1'] == 0, df['col3'], df['col1'])

However, using a nested np.where is slightly faster:

df['col1'] = np.where(df['col1'] == 0,                        np.where(df['col2'] == 0, df['col3'], df['col2']),                       df['col1'])

Timings

Using the following setup to produce a larger sample DataFrame and timing functions:

df = pd.concat([df]*10**4, ignore_index=True)  def root_nested(df):     df['col1'] = np.where(df['col1'] == 0, np.where(df['col2'] == 0, df['col3'], df['col2']), df['col1'])     return df  def root_split(df):     df['col1'] = np.where(df['col1'] == 0, df['col2'], df['col1'])     df['col1'] = np.where(df['col1'] == 0, df['col3'], df['col1'])     return df  def pir2(df):     df['col1'] = df.where(df.ne(0), np.nan).bfill(axis=1).col1.fillna(0)     return df  def pir2_2(df):     slc = (df.values != 0).argmax(axis=1)     return df.values[np.arange(slc.shape[0]), slc]  def andrew(df):     df.col1[df.col1 == 0] = df.col2     df.col1[df.col1 == 0] = df.col3     return df  def pablo(df):     df['col1'] = df['col1'].replace(0,df['col2'])     df['col1'] = df['col1'].replace(0,df['col3'])     return df

I get the following timings:

%timeit root_nested(df.copy()) 100 loops, best of 3: 2.25 ms per loop  %timeit root_split(df.copy()) 100 loops, best of 3: 2.62 ms per loop  %timeit pir2(df.copy()) 100 loops, best of 3: 6.25 ms per loop  %timeit pir2_2(df.copy()) 1 loop, best of 3: 2.4 ms per loop  %timeit andrew(df.copy()) 100 loops, best of 3: 8.55 ms per loop

I tried timing your method, but it's been running for multiple minutes without completing. As a comparison, timing your method on just the 6 row example DataFrame (not the much larger one tested above) took 12.8 ms.

191

answered Oct 05 '22 08:10

root

I'm not sure if it's faster, but you're right that you can slice the dataframe to get your desired result.

df.col1[df.col1 == 0] = df.col2 df.col1[df.col1 == 0] = df.col3 print(df)

Output:

   col1  col2  col3 0   0.2   0.3   0.3 1   0.2   0.3   0.3 2   0.4   0.4   0.4 3   0.3   0.0   0.3 4   0.0   0.0   0.0 5   0.1   0.4   0.4

Alternatively if you want it to be more terse (though I don't know if it's faster) you can combine what you did with what I did.

df.col1[df.col1 == 0] = df.col2.replace(0, df.col3) print(df)

Output:

   col1  col2  col3 0   0.2   0.3   0.3 1   0.2   0.3   0.3 2   0.4   0.4   0.4 3   0.3   0.0   0.3 4   0.0   0.0   0.0 5   0.1   0.4   0.4

answered Oct 05 '22 09:10

Andrew

Related questions
                            
                                Send keys control + click in Selenium with Python bindings
                            
                                Accessing NumPy array elements not in a given index list
                            
                                How to substract a single value from column of pandas DataFrame
                            
                                Generator Comprehension different output from list comprehension?
                            
                                Adding calculated column in Pandas
                            
                                Multiple condition filter on dataframe
                            
                                What is dispatch used for in django?
                            
                                What is the way to ignore/skip some issues from python bandit security issues report?
                            
                                Customizing an Admin form in Django while also using autodiscover
                            
                                Getting all items less than a month old
                            
                                Multiprocessing debug techniques
                            
                                How to fix issue with 'datetime.datetime' which has no attribute timedelta?
                            
                                Replacing part of string in python pandas dataframe
                            
                                Networkx : Convert multigraph into simple graph with weighted edges
                            
                                How to remove every other element of an array in python? (The inverse of np.repeat()?)
                            
                                installing cx_Freeze to python at windows
                            
                                Reload a Module in Python 3.4 [duplicate]
                            
                                Pandas - Is it possible to read_csv with no quotechar?
                            
                                How can I log a dictionary into a log file?
                            
                                Issue feeding a list into feed_dict in TensorFlow

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With