In pandas you can add/append a new column to the existing DataFrame using DataFrame. insert() method, this method updates the existing DataFrame with a new column. DataFrame. assign() is also used to insert a new column however, this method returns a new Dataframe after adding a new column.
I would have expected your syntax to work too. The problem arises because when you create new columns with the column-list syntax (df[[new1, new2]] = ...
), pandas requires that the right hand side be a DataFrame (note that it doesn't actually matter if the columns of the DataFrame have the same names as the columns you are creating).
Your syntax works fine for assigning scalar values to existing columns, and pandas is also happy to assign scalar values to a new column using the single-column syntax (df[new1] = ...
). So the solution is either to convert this into several single-column assignments, or create a suitable DataFrame for the right-hand side.
Here are several approaches that will work:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'col_1': [0, 1, 2, 3],
'col_2': [4, 5, 6, 7]
})
Then one of the following:
df['column_new_1'], df['column_new_2'], df['column_new_3'] = [np.nan, 'dogs', 3]
DataFrame
conveniently expands a single row to match the index, so you can do this:df[['column_new_1', 'column_new_2', 'column_new_3']] = pd.DataFrame([[np.nan, 'dogs', 3]], index=df.index)
df = pd.concat(
[
df,
pd.DataFrame(
[[np.nan, 'dogs', 3]],
index=df.index,
columns=['column_new_1', 'column_new_2', 'column_new_3']
)
], axis=1
)
join
instead of concat
(may be less efficient):df = df.join(pd.DataFrame(
[[np.nan, 'dogs', 3]],
index=df.index,
columns=['column_new_1', 'column_new_2', 'column_new_3']
))
df = df.join(pd.DataFrame(
{
'column_new_1': np.nan,
'column_new_2': 'dogs',
'column_new_3': 3
}, index=df.index
))
.assign()
with multiple column arguments.I like this variant on @zero's answer a lot, but like the previous one, the new columns will always be sorted alphabetically, at least with early versions of Python:
df = df.assign(column_new_1=np.nan, column_new_2='dogs', column_new_3=3)
new_cols = ['column_new_1', 'column_new_2', 'column_new_3']
new_vals = [np.nan, 'dogs', 3]
df = df.reindex(columns=df.columns.tolist() + new_cols) # add empty cols
df[new_cols] = new_vals # multi-column assignment works for existing cols
df['column_new_1'] = np.nan
df['column_new_2'] = 'dogs'
df['column_new_3'] = 3
Note: many of these options have already been covered in other answers: Add multiple columns to DataFrame and set them equal to an existing column, Is it possible to add several columns at once to a pandas DataFrame?, Add multiple empty columns to pandas DataFrame
You could use assign
with a dict of column names and values.
In [1069]: df.assign(**{'col_new_1': np.nan, 'col2_new_2': 'dogs', 'col3_new_3': 3})
Out[1069]:
col_1 col_2 col2_new_2 col3_new_3 col_new_1
0 0 4 dogs 3 NaN
1 1 5 dogs 3 NaN
2 2 6 dogs 3 NaN
3 3 7 dogs 3 NaN
With the use of concat:
In [128]: df
Out[128]:
col_1 col_2
0 0 4
1 1 5
2 2 6
3 3 7
In [129]: pd.concat([df, pd.DataFrame(columns = [ 'column_new_1', 'column_new_2','column_new_3'])])
Out[129]:
col_1 col_2 column_new_1 column_new_2 column_new_3
0 0.0 4.0 NaN NaN NaN
1 1.0 5.0 NaN NaN NaN
2 2.0 6.0 NaN NaN NaN
3 3.0 7.0 NaN NaN NaN
Not very sure of what you wanted to do with [np.nan, 'dogs',3]
. Maybe now set them as default values?
In [142]: df1 = pd.concat([df, pd.DataFrame(columns = [ 'column_new_1', 'column_new_2','column_new_3'])])
In [143]: df1[[ 'column_new_1', 'column_new_2','column_new_3']] = [np.nan, 'dogs', 3]
In [144]: df1
Out[144]:
col_1 col_2 column_new_1 column_new_2 column_new_3
0 0.0 4.0 NaN dogs 3
1 1.0 5.0 NaN dogs 3
2 2.0 6.0 NaN dogs 3
3 3.0 7.0 NaN dogs 3
use of list comprehension, pd.DataFrame
and pd.concat
pd.concat(
[
df,
pd.DataFrame(
[[np.nan, 'dogs', 3] for _ in range(df.shape[0])],
df.index, ['column_new_1', 'column_new_2','column_new_3']
)
], axis=1)
if adding a lot of missing columns (a, b, c ,....) with the same value, here 0, i did this:
new_cols = ["a", "b", "c" ]
df[new_cols] = pd.DataFrame([[0] * len(new_cols)], index=df.index)
It's based on the second variant of the accepted answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With