Currently, my table has over 10000000 records, and there is a column named ID
, and I want to update column named '3rd_col' with a new value if the ID
is in the given list.
I use .loc
and here is my code
for _id in given_ids:
df.loc[df.ID == _id, '3rd_col'] = new_value
But the performance of the above code is slow, how can I improve the performance of updating value?
Sorry, here I want to be more specific on my problem, different id has different values to be assigned based on a function and there are about 4 columns to be assigned.
for _id in given_ids:
df.loc[df.ID == _id, '3rd_col'] = return_new_val_1(id)
df.loc[df.ID == _id, '4rd_col'] = return_new_val_2(id)
df.loc[df.ID == _id, '5rd_col'] = return_new_val_3(id)
df.loc[df.ID == _id, '6rd_col'] = return_new_val_4(id)
You can create dictionary
first and then replace
:
#sample function
def return_new_val(x):
return x * 3
given_ids = list('abc')
d = {_id: return_new_val(_id) for _id in given_ids}
print (d)
{'a': 'aaa', 'c': 'ccc', 'b': 'bbb'}
df = pd.DataFrame({'ID':list('abdefc'),
'M':[4,5,4,5,5,4]})
df['3rd_col'] = df['ID'].replace(d)
print (df)
ID M 3rd_col
0 a 4 aaa
1 b 5 bbb
2 d 4 d
3 e 5 e
4 f 5 f
5 c 4 ccc
Or map
, but then get NaN
s for no match:
df['3rd_col'] = df['ID'].map(d)
print (df)
ID M 3rd_col
0 a 4 aaa
1 b 5 bbb
2 d 4 NaN
3 e 5 NaN
4 f 5 NaN
5 c 4 ccc
EDIT:
If need append data by multiple functions first create new DataFrame
and then join
to original:
def return_new_val1(x):
return x * 2
def return_new_val2(x):
return x * 3
given_ids = list('abc')
df2 = pd.DataFrame({'ID':given_ids})
df2['3rd_col'] = df2['ID'].map(return_new_val1)
df2['4rd_col'] = df2['ID'].map(return_new_val2)
df2 = df2.set_index('ID')
print (df2)
3rd_col 4rd_col
ID
a aa aaa
b bb bbb
c cc ccc
df = pd.DataFrame({'ID':list('abdefc'),
'M':[4,5,4,5,5,4]})
df = df.join(df2, on='ID')
print (df)
ID M 3rd_col 4rd_col
0 a 4 aa aaa
1 b 5 bb bbb
2 d 4 NaN NaN
3 e 5 NaN NaN
4 f 5 NaN NaN
5 c 4 cc ccc
#bur replace NaNs by values in `ID`
cols = ['3rd_col','4rd_col']
df[cols] = df[cols].mask(df[cols].isnull(), df['ID'], axis=0)
print (df)
ID M 3rd_col 4rd_col
0 a 4 aa aaa
1 b 5 bb bbb
2 d 4 d d
3 e 5 e e
4 f 5 f f
5 c 4 cc ccc
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With