I just discovered the <code>assign</code> method for pandas dataframes, and it looks nice and very similar to dplyr's <code>mutate</code> in R. However, I've always gotten by by just initializing a new column 'on the fly'. Is there a reason why <code>assign</code> is better? For instance (based on the example in the pandas documentation), to create a new column in a dataframe, I could just do this: <pre class="prettyprint"><code>df = DataFrame({'A': range(1, 11), 'B': np.random.randn(10)}) df['ln_A'] = np.log(df['A']) </code></pre> but the <code>pandas.DataFrame.assign</code> documentation recommends doing this: <pre class="prettyprint"><code>df.assign(ln_A = lambda x: np.log(x.A)) # or newcol = np.log(df['A']) df.assign(ln_A=newcol) </code></pre> Both methods return the same dataframe. In fact, the first method (my 'on the fly' assignment) is significantly faster (0.202 seconds for 1000 iterations) than the <code>.assign</code> method (0.353 seconds for 1000 iterations). So is there a reason I should stop using my old method in favour of <code>df.assign</code>?

The difference concerns whether you wish to modify an existing frame, or create a new frame while maintaining the original frame as it was. In particular, <code>DataFrame.assign</code> returns you a new object that has a copy of the original data with the requested changes ... the original frame remains unchanged. In your particular case: <pre class="prettyprint"><code>>>> df = DataFrame({'A': range(1, 11), 'B': np.random.randn(10)}) </code></pre> Now suppose you wish to create a new frame in which <code>A</code> is everywhere <code>1</code> without destroying <code>df</code>. Then you could use <code>.assign</code> <pre class="prettyprint"><code>>>> new_df = df.assign(A=1) </code></pre> If you do not wish to maintain the original values, then clearly <code>df["A"] = 1</code> will be more appropriate. This also explains the speed difference, by necessity <code>.assign</code> must copy the data while <code>[...]</code> does not.

Why use pandas.assign rather than simply initialize new column?

Tags:

python

pandas

I just discovered the assign method for pandas dataframes, and it looks nice and very similar to dplyr's mutate in R. However, I've always gotten by by just initializing a new column 'on the fly'. Is there a reason why assign is better?

For instance (based on the example in the pandas documentation), to create a new column in a dataframe, I could just do this:

df = DataFrame({'A': range(1, 11), 'B': np.random.randn(10)}) df['ln_A'] = np.log(df['A'])

but the pandas.DataFrame.assign documentation recommends doing this:

df.assign(ln_A = lambda x: np.log(x.A)) # or  newcol = np.log(df['A']) df.assign(ln_A=newcol)

Both methods return the same dataframe. In fact, the first method (my 'on the fly' assignment) is significantly faster (0.202 seconds for 1000 iterations) than the .assign method (0.353 seconds for 1000 iterations).

So is there a reason I should stop using my old method in favour of df.assign?

418

asked Jan 09 '18 23:01

sacuL

1 Answers

The difference concerns whether you wish to modify an existing frame, or create a new frame while maintaining the original frame as it was.

In particular, DataFrame.assign returns you a new object that has a copy of the original data with the requested changes ... the original frame remains unchanged.

In your particular case:

>>> df = DataFrame({'A': range(1, 11), 'B': np.random.randn(10)})

Now suppose you wish to create a new frame in which A is everywhere 1 without destroying df. Then you could use .assign

>>> new_df = df.assign(A=1)

If you do not wish to maintain the original values, then clearly df["A"] = 1 will be more appropriate. This also explains the speed difference, by necessity .assign must copy the data while [...] does not.

118

answered Sep 18 '22 15:09

donkopotamus

Related questions
                            
                                How to get current URL in jinja2/flask (request.url not working)
                            
                                Django multi-database routing
                            
                                Python unittest discovery with subfolders
                            
                                when to use pre_save, save, post_save in django?
                            
                                Python: Passing a class name as a parameter to a function?
                            
                                How to read a raw image using PIL?
                            
                                interpolate 3D volume with numpy and or scipy
                            
                                Python thread name doesn't show up on ps or htop
                            
                                The print of string constant is always attached with 'b' inTensorFlow [duplicate]
                            
                                How do you get Python documentation in Texinfo Info format?
                            
                                Classifying Documents into Categories
                            
                                What good are Python function annotations? [duplicate]
                            
                                What is a correct way to filter different loggers using python logging?
                            
                                How to format IPython html display of Pandas dataframe?
                            
                                Dropping time from datetime <[M8] in Pandas
                            
                                Matplotlib - Plot a plane and points in 3D simultaneously
                            
                                Keras flowFromDirectory get file names as they are being generated
                            
                                Python inheritance - how to call grandparent method?
                            
                                matplotlib Axes.plot() vs pyplot.plot()
                            
                                Python 2.7 not working anymore: cannot import name md5

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With