Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

setting values in a pandas dataframe using loc - multiple selection criteria allowing setting value in a different column

I have a database with multiple columns and rows. I want to locate within the database rows that meet certain criteria of a subset of the columns AND if it meets that criteria change the value of a different column in that same row.

I am prototyping with the following database

df = pd.DataFrame([[1, 2], [4, 5], [5, 5], [5, 9], [55, 55]], columns=['max_speed', 'shield'])
df['frcst_stus'] = 'current'
df

which gives the following result:

max_speed   shield  frcst_stus
0   1   2   current
1   4   5   current
2   5   5   current
3   5   9   current
4   55  55  current

I want to change index row 2 to read 5, 5, 'hello' without changing the rest of the dataframe.

I can do the examples in the Pandas.loc documentation at setting values. I can set a row, a column, and rows matching a callable condition. But the call is on a single column or series. I want two.

And I have found a number of stackoverflow answers that answer the question using loc on a single column to set a value in a second column. That's not my issue. I want to search two columns worth of data.

The following allows me to get the row I want:

result = df[(df['shield'] == 5) & (df['max_speed'] == 5) & (df['frcst_stus'] == 'current')]

And I know that just changing the equal signs (== 'current') to (= 'current') gives me an error.

And when I select on two columns I can set the columns (see below), but both columns get set. ('arghh') and when I try to test the value of 'max_speed' I get a false is not in index error.

df.loc[:, ['max_speed', 'frcst_stus']] = 'hello'

I also get an error trying to explain the boolean issues with Python. Frankly, I just don't understand the whole overloading yet.

like image 371
wiseass Avatar asked Dec 13 '22 10:12

wiseass


1 Answers

If need to set different values to both columns by mask m:

m = (df['shield'] == 5) & (df['max_speed'] == 5) & (df['frcst_stus'] == 'current')

df.loc[m, ['max_speed', 'frcst_stus']] = [100, 'hello']

If need to set same values to both columns by mask m:

df.loc[m, ['max_speed', 'frcst_stus']] = 'hello'

If need to set only one column by mask m:

df.loc[m, 'frcst_stus'] = 'hello'
like image 92
jezrael Avatar answered Jan 08 '23 12:01

jezrael