I know two ways of adding a new column to pandas dataframe <pre class="prettyprint"><code>df_new = df.assign(new_column=default_value) </code></pre> and <pre class="prettyprint"><code>df[new_column] = default_value </code></pre> The first one does not add columns inplace, but the second one does. So, which one is more efficient to use? Apart from these two is there is any all the more efficient method than these?

I think second one, <code>assign</code> is used if want nice code witch chaining all functions - one line code: <pre class="prettyprint"><code>df = pd.DataFrame({'A':np.random.rand(10000)}) default_value = 10 In [114]: %timeit df_new = df.assign(new_column=default_value) 228 µs ± 4.26 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [115]: %timeit df['new_column'] = default_value 86.1 µs ± 654 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) </code></pre> I use perfplot for ploting: <img src="https://i.stack.imgur.com/F90q3.png" alt="pic"> <hr> <pre class="prettyprint"><code>import perfplot default_value = 10 def chained(df): df = df.assign(new_column=default_value) return df def no_chained(df): df['new_column'] = default_value return df def make_df(n): df = pd.DataFrame({'A':np.random.rand(n)}) return df perfplot.show( setup=make_df, kernels=[chained, no_chained], n_range=[2**k for k in range(2, 25)], logx=True, logy=True, equality_check=False, xlabel='len(df)') </code></pre>

Efficient way to add new column to pandas dataframe

Tags:

python

pandas

I know two ways of adding a new column to pandas dataframe

df_new = df.assign(new_column=default_value)

and

df[new_column] = default_value

The first one does not add columns inplace, but the second one does. So, which one is more efficient to use?

Apart from these two is there is any all the more efficient method than these?

398

asked Sep 12 '18 07:09

thelogicalkoan

1 Answers

I think second one, assign is used if want nice code witch chaining all functions - one line code:

df = pd.DataFrame({'A':np.random.rand(10000)})

default_value = 10

In [114]: %timeit df_new = df.assign(new_column=default_value)
228 µs ± 4.26 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [115]: %timeit df['new_column'] = default_value
86.1 µs ± 654 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

I use perfplot for ploting:

import perfplot

default_value = 10

def chained(df):
    df = df.assign(new_column=default_value)
    return df

def no_chained(df):
    df['new_column'] = default_value
    return df

def make_df(n):
    df = pd.DataFrame({'A':np.random.rand(n)})
    return df

perfplot.show(
    setup=make_df,
    kernels=[chained, no_chained],
    n_range=[2**k for k in range(2, 25)],
    logx=True,
    logy=True,
    equality_check=False,
    xlabel='len(df)')

answered Sep 30 '22 13:09

jezrael

Related questions
                            
                                Define heap key for an array of tuples
                            
                                Locate source code from pip install packages in Ubuntu
                            
                                Is there a quick way to turn a pandas DataFrame into a pretty HTML table?
                            
                                exec() and variable scope [duplicate]
                            
                                aiohttp: when is the response.status available?
                            
                                How to view initialized weights (i.e. before training)?
                            
                                Selenium + ChromeDriver printToPDF
                            
                                changing arrowhead type in networkx
                            
                                How do I embed a Flask-Security login form on my page?
                            
                                disk I/O error with SQLite3 in Python 3 when writing to a database
                            
                                Why is this warning "Expected type 'int' (matched generic type '_T'), got 'Dict[str, None]' instead"?
                            
                                How to display a pandas dataframe as datatable?
                            
                                Running Flask & a Discord bot in the same application
                            
                                Empty class with comment same as pass?
                            
                                How to cancel the effect of numpy seed()?
                            
                                Massive overfit during resnet50 transfer learning
                            
                                How can I specify the figsize of a graphviz representation of a decision tree?
                            
                                python pytest occasionally fails with OSError: reading from stdin while output is captured
                            
                                Does EarlyStopping in Keras save the best model?
                            
                                Prevent pip from installing some dependencies

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With