Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Vectorized update to pandas DataFrame?

I have a dataframe for which I'd like to update a column with some values from an array. The array is of a different lengths to the dataframe however, but I have the indices for the rows of the dataframe that I'd like to update.

I can do this with a loop through the rows (below) but I expect there is a much more efficient way to do this via a vectorized approach, but I can't seem to get the syntax correct.

In the example below I just fill the column with nan and then use the indices directly through a loop.

df['newcol'] = np.nan

j = 0
for i in update_idx:
    df['newcol'][i] = new_values[j]
    j+=1
like image 758
anthr Avatar asked Nov 15 '25 00:11

anthr


1 Answers

if you have a list of indices already then you can use loc to perform label (row) selection, you can pass the new column name, where your existing rows are not selected these will have NaN assigned:

df.loc[update_idx, 'new_col'] = new_value

Example:

In [4]:
df = pd.DataFrame({'a':np.arange(5), 'b':np.random.randn(5)}, index = list('abcde'))
df

Out[4]:
   a         b
a  0  1.800300
b  1  0.351843
c  2  0.278122
d  3  1.387417
e  4  1.202503

In [5]:    
idx_list = ['b','d','e']
df.loc[idx_list, 'c'] = np.arange(3)
df

Out[5]:
   a         b   c
a  0  1.800300 NaN
b  1  0.351843   0
c  2  0.278122 NaN
d  3  1.387417   1
e  4  1.202503   2
like image 91
EdChum Avatar answered Nov 17 '25 20:11

EdChum



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!