I have a dataframe for which I'd like to update a column with some values from an array. The array is of a different lengths to the dataframe however, but I have the indices for the rows of the dataframe that I'd like to update.
I can do this with a loop through the rows (below) but I expect there is a much more efficient way to do this via a vectorized approach, but I can't seem to get the syntax correct.
In the example below I just fill the column with nan and then use the indices directly through a loop.
df['newcol'] = np.nan
j = 0
for i in update_idx:
df['newcol'][i] = new_values[j]
j+=1
if you have a list of indices already then you can use loc to perform label (row) selection, you can pass the new column name, where your existing rows are not selected these will have NaN assigned:
df.loc[update_idx, 'new_col'] = new_value
Example:
In [4]:
df = pd.DataFrame({'a':np.arange(5), 'b':np.random.randn(5)}, index = list('abcde'))
df
Out[4]:
a b
a 0 1.800300
b 1 0.351843
c 2 0.278122
d 3 1.387417
e 4 1.202503
In [5]:
idx_list = ['b','d','e']
df.loc[idx_list, 'c'] = np.arange(3)
df
Out[5]:
a b c
a 0 1.800300 NaN
b 1 0.351843 0
c 2 0.278122 NaN
d 3 1.387417 1
e 4 1.202503 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With