Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Dynamically deciede which column to update based on another column's value

Tags:

python

pandas

Let's say we have the following dataframe

    handle  image_src   image_position  image_src_1 image_src_2 image_src_3
0        a       img1                1         None        None        None
1        b       img2                2         None        None        None
2        c       img3                3         None        None        None

How can we convert this datafame into this?

    handle  image_src   image_position  image_src_1 image_src_2 image_src_3
0        a       img1                1         img1        None        None
1        b       img2                2         None        img2        None
2        c       img3                3         None        None        img3

Basically I want to fill the image_src_{image_position} column based on the value of image_position of each row.

I come up with a solution which involves manually iterating over the whole dataframe, like this

data = [
    {"handle": "a","image_src": "img1","image_position": 1,"image_src_1": None,"image_src_2": None,"image_src_3": None},
    {"handle": "b","image_src": "img2","image_position": 2,"image_src_1": None,"image_src_2": None,"image_src_3": None},
    {"handle": "c","image_src": "img3","image_position": 3,"image_src_1": None,"image_src_2": None,"image_src_3": None}
]

df = pd.DataFrame(data)

for index in range(0, len(df)):
    row = df.iloc[index]
    
    position = row["image_position"].astype("int64")
    
    df.loc[index, f"image_src_{position}"] = row["image_src"]

But as iterating over the whole dataframe is a bad thing, how can I imporove this?

like image 686
Sazzadur Rahman Avatar asked Sep 11 '25 14:09

Sazzadur Rahman


1 Answers

First that came to my mind was .pivot() function. Though this might not be the exact thing you wanted, — the original image_src column would be dropped, the column naming would slightly differ and nan might need handling, — still it's rather a short solution.

(df.pivot(index=['handle', 'image_position'],  # to keep these columns intact
          columns='image_src',
          values='image_src')
   .add_prefix('image_src_')
   .reset_index())

What I got:

handle image_position image_src_img1 image_src_img2 image_src_img3
0 a 1 img1 nan nan
1 b 2 nan img2 nan
2 c 3 nan nan img3

Another take is to use image_position column as an index for the new columns, thus it would be dropped too. Also you may want to replace nan with None:

import numpy as np

(df.pivot(index='handle',
          columns='image_position',
          values='image_src')
   .add_prefix('image_src_')
   .replace({np.nan: None})
   .reset_index())

Goes like this:

handle image_src_1 image_src_2 image_src_3
0 a img1 None None
1 b None img2 None
2 c None None img3
like image 82
Nikita Shabankin Avatar answered Sep 14 '25 04:09

Nikita Shabankin