Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

apply function on subset of dataframe rows in column based on value in other column

New to pandas, so please bear with me.

I have a text processing function I'd like to run on a column in my datafame conditional on the value in another column. I've seen

Depending on whether something is flagged, I want to run a translation function on it.

   account  article    ... translation  flag
0    123      text      ...               1
1    123      text      ...               0
2    123      text      ...               1

I tried:

df['translation'] = df[['flag', 'text']].apply(lambda x: translate(['article']) if ['flag'] == 1 else None)

and get this in return:

ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index flag')

Any help or guidance would be appreciated.

like image 633
Alex Avatar asked Jan 26 '23 16:01

Alex


1 Answers

I used a test DataFrame similar to yours, without translation column:

   account article  flag
0      123   text1     1
1      123   text2     0
2      123   text3     1

Then I defined a "surrogate" translation function:

def translate(txt):
    return '_' + txt + '_'

And to call it conditionally, run:

df['translation'] = df.apply(lambda row:
    translate(row.article) if row.flag == 1 else None, axis=1)

The result is:

   account article  flag translation
0      123   text1     1     _text1_
1      123   text2     0        None
2      123   text3     1     _text3_

What was wrong in your code:

  1. In you want to limit source data to a subset of columns, use existing column names (article instead text) and include all columns used in the applied function.

  2. The lambda function is applied to each row, so you should have passed axis=1 parameter (default axis is 0).

  3. When your function is called, the current row is passed as a parameter (x), but to refer to some column in it you should have used x.column_name notation. E.g. my solution could also be:

     df[['article', 'flag']].apply(lambda row:
         translate(row.article) if row.flag == 1 else None, axis=1)
    
  4. Argument like ['article'] is here just a list, containing one word (article). I doubt whether your translation function is able to process a list argument.

  5. A similar remark concerning if ['flag'] .... This is not a reference to a column in the source row.

like image 91
Valdi_Bo Avatar answered Apr 29 '23 18:04

Valdi_Bo