I have the following code where I create df['var'2]
and alter df['var1']
. After performing these changes, I would like to append the newrow
(with df['var'2]
) to the dataframe while keeping the original (though now altered) row (which has df['var1']
).
for i, row in df.iterrows():
while row['var1'] > 30:
newrow = row
newrow['var2'] = 30
row['var1'] = row['var1']-30
df.append(newrow)
I understand that when using iterrows()
, row variables are copies instead of views which is why the changes are not being updated in the original dataframe. So, how would I alter this code to actually append newrow to the dataframe?
Thank you!
Use concat() to Append Use pd. concat([new_row,df. loc[:]]). reset_index(drop=True) to append the row to the first position of the DataFrame as Index starts from zero.
iterrows() is used to iterate over a pandas Data frame rows in the form of (index, series) pair. This function iterates over the data frame column, it will return a tuple with the column name and content in form of series.
The iterrows() method generates an iterator object of the DataFrame, allowing us to iterate each row in the DataFrame. Each iteration produces an index object and a row object (a Pandas Series object).
append() function is used to append rows of other dataframe to the end of the given dataframe, returning a new dataframe object. Columns not in the original dataframes are added as new columns and the new cells are populated with NaN value.
It is generally inefficient to append rows to a dataframe in a loop because a new copy is returned. You are better off storing the intermediate results in a list and then concatenating everything together at the end.
Using row.loc['var1'] = row['var1'] - 30
will make an inplace change to the original dataframe.
np.random.seed(0)
df = pd.DataFrame(np.random.randn(5, 2) * 100, columns=['var1', 'var2'])
>>> df
var1 var2
0 176.405235 40.015721
1 97.873798 224.089320
2 186.755799 -97.727788
3 95.008842 -15.135721
4 -10.321885 41.059850
new_rows = []
for i, row in df.iterrows():
while row['var1'] > 30:
newrow = row
newrow['var2'] = 30
row.loc['var1'] = row['var1'] - 30
new_rows.append(newrow.values)
df_new = df.append(pd.DataFrame(new_rows, columns=df.columns)).reset_index()
>>> df
var1 var2
0 26.405235 30.00000
1 7.873798 30.00000
2 6.755799 30.00000
3 5.008842 30.00000
4 -10.321885 41.05985
>>> df_new
var1 var2
0 26.405235 30.00000
1 7.873798 30.00000
2 6.755799 30.00000
3 5.008842 30.00000
4 -10.321885 41.05985
5 26.405235 30.00000
6 26.405235 30.00000
7 26.405235 30.00000
8 26.405235 30.00000
9 26.405235 30.00000
10 7.873798 30.00000
11 7.873798 30.00000
12 7.873798 30.00000
13 6.755799 30.00000
14 6.755799 30.00000
15 6.755799 30.00000
16 6.755799 30.00000
17 6.755799 30.00000
18 6.755799 30.00000
19 5.008842 30.00000
20 5.008842 30.00000
21 5.008842 30.00000
EDIT (per request below):
new_rows = []
for i, row in df.iterrows():
while row['var1'] > 30:
row.loc['var1'] = var1 = row['var1'] - 30
new_rows.append([var1, 30])
df_new = df.append(pd.DataFrame(new_rows, columns=df.columns)).reset_index()
>>> df_new
index var1 var2
0 0 26.405235 40.015721
1 1 7.873798 224.089320
2 2 6.755799 -97.727788
3 3 5.008842 -15.135721
4 4 -10.321885 41.059850
5 0 146.405235 30.000000
6 1 116.405235 30.000000
7 2 86.405235 30.000000
8 3 56.405235 30.000000
9 4 26.405235 30.000000
10 5 67.873798 30.000000
11 6 37.873798 30.000000
12 7 7.873798 30.000000
13 8 156.755799 30.000000
14 9 126.755799 30.000000
15 10 96.755799 30.000000
16 11 66.755799 30.000000
17 12 36.755799 30.000000
18 13 6.755799 30.000000
19 14 65.008842 30.000000
20 15 35.008842 30.000000
21 16 5.008842 30.000000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With