I have the following indexed DataFrame with named columns and rows not- continuous numbers: <pre class="prettyprint"><code> a b c d 2 0.671399 0.101208 -0.181532 0.241273 3 0.446172 -0.243316 0.051767 1.577318 5 0.614758 0.075793 -0.451460 -0.012493 </code></pre> I would like to add a new column, <code>'e'</code>, to the existing data frame and do not want to change anything in the data frame (i.e., the new column always has the same length as the DataFrame). <pre class="prettyprint"><code>0 -0.335485 1 -1.166658 2 -0.385571 dtype: float64 </code></pre> How can I add column <code>e</code> to the above example?

Edit 2017 As indicated in the comments and by @Alexander, currently the best method to add the values of a Series as a new column of a DataFrame could be using <code>assign</code>: <pre class="prettyprint"><code>df1 = df1.assign(e=pd.Series(np.random.randn(sLength)).values) </code></pre> <hr> Edit 2015 Some reported getting the <code>SettingWithCopyWarning</code> with this code. However, the code still runs perfectly with the current pandas version 0.16.1. <pre class="prettyprint"><code>>>> sLength = len(df1['a']) >>> df1 a b c d 6 -0.269221 -0.026476 0.997517 1.294385 8 0.917438 0.847941 0.034235 -0.448948 >>> df1['e'] = pd.Series(np.random.randn(sLength), index=df1.index) >>> df1 a b c d e 6 -0.269221 -0.026476 0.997517 1.294385 1.757167 8 0.917438 0.847941 0.034235 -0.448948 2.228131 >>> pd.version.short_version '0.16.1' </code></pre> The <code>SettingWithCopyWarning</code> aims to inform of a possibly invalid assignment on a copy of the Dataframe. It doesn't necessarily say you did it wrong (it can trigger false positives) but from 0.13.0 it let you know there are more adequate methods for the same purpose. Then, if you get the warning, just follow its advise: Try using .loc[row_index,col_indexer] = value instead <pre class="prettyprint"><code>>>> df1.loc[:,'f'] = pd.Series(np.random.randn(sLength), index=df1.index) >>> df1 a b c d e f 6 -0.269221 -0.026476 0.997517 1.294385 1.757167 -0.050927 8 0.917438 0.847941 0.034235 -0.448948 2.228131 0.006109 >>> </code></pre> In fact, this is currently the more efficient method as described in pandas docs <hr> Original answer: Use the original df1 indexes to create the series: <pre class="prettyprint"><code>df1['e'] = pd.Series(np.random.randn(sLength), index=df1.index) </code></pre>

How to add a new column to an existing DataFrame?

Tags:

python

pandas

dataframe

chained-assignment

I have the following indexed DataFrame with named columns and rows not- continuous numbers:

          a         b         c         d 2  0.671399  0.101208 -0.181532  0.241273 3  0.446172 -0.243316  0.051767  1.577318 5  0.614758  0.075793 -0.451460 -0.012493

I would like to add a new column, 'e', to the existing data frame and do not want to change anything in the data frame (i.e., the new column always has the same length as the DataFrame).

0   -0.335485 1   -1.166658 2   -0.385571 dtype: float64

How can I add column e to the above example?

724

asked Sep 23 '12 19:09

tomasz74

1 Answers

Edit 2017

As indicated in the comments and by @Alexander, currently the best method to add the values of a Series as a new column of a DataFrame could be using assign:

df1 = df1.assign(e=pd.Series(np.random.randn(sLength)).values)

Edit 2015
Some reported getting the SettingWithCopyWarning with this code.
However, the code still runs perfectly with the current pandas version 0.16.1.

>>> sLength = len(df1['a']) >>> df1           a         b         c         d 6 -0.269221 -0.026476  0.997517  1.294385 8  0.917438  0.847941  0.034235 -0.448948  >>> df1['e'] = pd.Series(np.random.randn(sLength), index=df1.index) >>> df1           a         b         c         d         e 6 -0.269221 -0.026476  0.997517  1.294385  1.757167 8  0.917438  0.847941  0.034235 -0.448948  2.228131  >>> pd.version.short_version '0.16.1'

The SettingWithCopyWarning aims to inform of a possibly invalid assignment on a copy of the Dataframe. It doesn't necessarily say you did it wrong (it can trigger false positives) but from 0.13.0 it let you know there are more adequate methods for the same purpose. Then, if you get the warning, just follow its advise: Try using .loc[row_index,col_indexer] = value instead

>>> df1.loc[:,'f'] = pd.Series(np.random.randn(sLength), index=df1.index) >>> df1           a         b         c         d         e         f 6 -0.269221 -0.026476  0.997517  1.294385  1.757167 -0.050927 8  0.917438  0.847941  0.034235 -0.448948  2.228131  0.006109 >>>

In fact, this is currently the more efficient method as described in pandas docs

Original answer:

Use the original df1 indexes to create the series:

df1['e'] = pd.Series(np.random.randn(sLength), index=df1.index)

134

answered Sep 27 '22 20:09

joaquin

Related questions
                            
                                How do I get the full path of the current file's directory?
                            
                                Display number with leading zeros
                            
                                Change column type in pandas
                            
                                Running shell command and capturing the output
                            
                                How to copy a dictionary and only edit the copy
                            
                                Best way to return multiple values from a function? [closed]
                            
                                How to move a file in Python?
                            
                                How does the @property decorator work in Python?
                            
                                How to get the ASCII value of a character
                            
                                How do I check if a variable exists?
                            
                                How do I find the location of my Python site-packages directory?
                            
                                Relative imports for the billionth time
                            
                                How to get line count of a large file cheaply in Python?
                            
                                How to read a text file into a string variable and strip newlines?
                            
                                Does Django scale? [closed]
                            
                                Relative imports in Python 3
                            
                                Create a Pandas Dataframe by appending one row at a time
                            
                                Why do people write #!/usr/bin/env python on the first line of a Python script?
                            
                                How to reverse a list?
                            
                                How can I sort a dictionary by key?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With