Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to add a new column to an existing DataFrame?

I have the following indexed DataFrame with named columns and rows not- continuous numbers:

          a         b         c         d 2  0.671399  0.101208 -0.181532  0.241273 3  0.446172 -0.243316  0.051767  1.577318 5  0.614758  0.075793 -0.451460 -0.012493 

I would like to add a new column, 'e', to the existing data frame and do not want to change anything in the data frame (i.e., the new column always has the same length as the DataFrame).

0   -0.335485 1   -1.166658 2   -0.385571 dtype: float64 

How can I add column e to the above example?

like image 724
tomasz74 Avatar asked Sep 23 '12 19:09

tomasz74


People also ask

How do I add a column to an existing DF in Python?

Using assign() DataFrame. assign() method can be used when you need to insert multiple new columns in a DataFrame, when you need to ignore the index of the column to be added or when you need to overwrite the values of an existing columns. Always remember that with assign: the index of the column to be added is ignored.

How do I add to an existing data frame?

append() function is used to append rows of other dataframe to the end of the given dataframe, returning a new dataframe object. Columns not in the original dataframes are added as new columns and the new cells are populated with NaN value. ignore_index : If True, do not use the index labels.

How do you add a new column to a DataFrame based on another column?

Using apply() method If you need to apply a method over an existing column in order to compute some values that will eventually be added as a new column in the existing DataFrame, then pandas. DataFrame. apply() method should do the trick.


1 Answers

Edit 2017

As indicated in the comments and by @Alexander, currently the best method to add the values of a Series as a new column of a DataFrame could be using assign:

df1 = df1.assign(e=pd.Series(np.random.randn(sLength)).values) 

Edit 2015
Some reported getting the SettingWithCopyWarning with this code.
However, the code still runs perfectly with the current pandas version 0.16.1.

>>> sLength = len(df1['a']) >>> df1           a         b         c         d 6 -0.269221 -0.026476  0.997517  1.294385 8  0.917438  0.847941  0.034235 -0.448948  >>> df1['e'] = pd.Series(np.random.randn(sLength), index=df1.index) >>> df1           a         b         c         d         e 6 -0.269221 -0.026476  0.997517  1.294385  1.757167 8  0.917438  0.847941  0.034235 -0.448948  2.228131  >>> pd.version.short_version '0.16.1' 

The SettingWithCopyWarning aims to inform of a possibly invalid assignment on a copy of the Dataframe. It doesn't necessarily say you did it wrong (it can trigger false positives) but from 0.13.0 it let you know there are more adequate methods for the same purpose. Then, if you get the warning, just follow its advise: Try using .loc[row_index,col_indexer] = value instead

>>> df1.loc[:,'f'] = pd.Series(np.random.randn(sLength), index=df1.index) >>> df1           a         b         c         d         e         f 6 -0.269221 -0.026476  0.997517  1.294385  1.757167 -0.050927 8  0.917438  0.847941  0.034235 -0.448948  2.228131  0.006109 >>>  

In fact, this is currently the more efficient method as described in pandas docs


Original answer:

Use the original df1 indexes to create the series:

df1['e'] = pd.Series(np.random.randn(sLength), index=df1.index) 
like image 134
joaquin Avatar answered Sep 27 '22 20:09

joaquin