Assigning a column value based on multiple column conditions in python

Question

I've got a pandas dataframe that is structured as such,

ID  Col1  Col2
1   50    12:23:01
1   34    12:25:11
1   65    12:32:25
1   98    12:45:08
2   23    11:09:10
2   12    11:12:43
2   56    11:13:12
2   34    11:14:26
2   77    11:16:02
3   64    14:01:11
3   34    14:01:13
3   48    14:02:32

What I need is to be able to search within a repeating ID value to find a condition in column 1, say Col1==34. Based on this, I need to create a new column, Col3, that takes on the corresponding value in Col2. The end result I need is shown below.

ID  Col1  Col2      Col3
1   50    12:23:01  12:25:11
1   34    12:25:11  12:25:11
1   65    12:32:25  12:25:11
1   98    12:45:08  12:25:11
2   23    11:09:10  11:14:26
2   12    11:12:43  11:14:26
2   56    11:13:12  11:14:26
2   34    11:14:26  11:14:26
2   77    11:16:02  11:14:26
3   64    14:01:11  14:01:13
3   34    14:01:13  14:01:13
3   48    14:02:32  14:01:13

I've tried the following, but it's not pulling the distinct Col2 value, rather it's just duplicating Col2

df['Col3'] = np.where(df.Col1.isin(df[df.Col2==34].Col1), df['Col2'], 0)

I realize that assigning the df['Col2'] else 0 from the where condition is most likely my logic issue, and that there is probably some easy concise way of doing this (or that my time might be better spent in SQL), but I'm not sure on how to set this up. Thanks in advance.

piRSquared · Accepted Answer

using query + map

df['Col3'] = df.ID.map(df.query('Col1 == 34').set_index('ID').Col2)

print(df)

    ID  Col1      Col2      Col3
0    1    50  12:23:01  12:25:11
1    1    34  12:25:11  12:25:11
2    1    65  12:32:25  12:25:11
3    1    98  12:45:08  12:25:11
4    2    23  11:09:10  11:14:26
5    2    12  11:12:43  11:14:26
6    2    56  11:13:12  11:14:26
7    2    34  11:14:26  11:14:26
8    2    77  11:16:02  11:14:26
9    3    64  14:01:11  14:01:13
10   3    34  14:01:13  14:01:13
11   3    48  14:02:32  14:01:13

dealing with duplicates

# keep first instance
df.ID.map(df.query('Col1 == 34') \
    .drop_duplicates(subset=['ID']).set_index('ID').Col2)

Or

# keep last instance
df.ID.map(df.query('Col1 == 34') \
    .drop_duplicates(subset=['ID'], keep='last').set_index('ID').Col2)

Assigning a column value based on multiple column conditions in python

Tags:

python

variable-assignment

pandas

numpy

rastrast

1 Answers

piRSquared

Recent Activity

Donate For Us

Assigning a column value based on multiple column conditions in python

Tags:

python

variable-assignment

pandas

numpy

rastrast

1 Answers

piRSquared

Related questions

Recent Activity

Donate For Us