How to get specific unique combinations of a dataframe using only dataframe operations?

Question

I have this data that contains entries of 2 players playing a game one after another then after they both gain a win or lose they get a shared score (logic of this is unimportant numbers are random anyway this is just an example to describe what I want).

So there are scores obtained for every possible outcome after player P1 and player P2 have played.

The logic of the game is not important, all I want to know is if I can create a new dataframe of all unique combinations of these 4 players playing using my initial dataframe. So calculate a new score for all possible combinations of these 4 Players if they all play and get a score together, let's say their total scores would be summed up.

Example:

Player_1 Player_2 Player_3 Player_4 Outcome_1 Outcome_2  Outcome_3 Outcome_4 Score
P1       P2       P3        P4         win       win       win       win      72

and other possible unique combinations.

The key is to get a score of 30 from the combination where both P1 and P2 win and get 42 from the combination where both P3 and P4 have won and sum them to create the score if these 4 players have played and they have all won.

I can do this with generating unique combinations etc., but in a real use case with larger parameters etc. its too long and results in dirty hard to read code. What I want to know is is there a way to achieve this using only operations such as merge, groupby, join, agg etc.

import pandas as pd

data = {
    "Player_1": ["P1", "P1", "P1", "P1", "P2", "P2", "P2", "P2", "P1", "P1", "P1", "P1", "P3", "P3", "P3", "P3"],
    "Player_2": ["P2", "P2", "P2", "P2", "P3", "P3", "P3", "P3", "P4", "P4", "P4", "P4", "P4", "P4", "P4", "P4"],
    "Outcome_1": ["win", "win", "lose", "lose", "win", "win", "lose", "lose", "win", "win", "lose", "lose", "win", "win", "lose", "lose"],
    "Outcome_2": ["win", "lose", "win", "lose", "win", "lose", "win", "lose", "win", "lose", "win", "lose", "win", "lose", "win", "lose"],
    "Score": [30, 45, 12, 78, 56, 21, 67, 90, 15, 32, 68, 88, 42, 74, 8, 93]
}

df = pd.DataFrame(data)

print(df)

   Player_1 Player_2 Outcome_1 Outcome_2  Score
0        P1       P2       win       win     30
1        P1       P2       win      lose     45
2        P1       P2      lose       win     12
3        P1       P2      lose      lose     78
4        P2       P3       win       win     56
5        P2       P3       win      lose     21
6        P2       P3      lose       win     67
7        P2       P3      lose      lose     90
8        P1       P4       win       win     15
9        P1       P4       win      lose     32
10       P1       P4      lose       win     68
11       P1       P4      lose      lose     88
12       P3       P4       win       win     42
13       P3       P4       win      lose     74
14       P3       P4      lose       win      8
15       P3       P4      lose      lose     93

Andrej Kesely · Accepted Answer

I hope I've understood your question right. From the comments I'm supposing you have total 4 players in the dataframe:

from itertools import product

p1, p2, p3, p4 = np.unique(df[["Player_1", "Player_2"]].values)
df = df.set_index(["Player_1", "Player_2", "Outcome_1", "Outcome_2"])

all_data = []
for p1o1, p2o2, p3o1, p4o2 in product(["win", "lose"], repeat=4):
    all_data.append(
        (
            p1,
            p2,
            p3,
            p4,
            p1o1,
            p2o2,
            p3o1,
            p4o2,
            df.loc[(p1, p2, p1o1, p2o2), "Score"]
            + df.loc[(p3, p4, p3o1, p4o2), "Score"],
        )
    )

out = pd.DataFrame(
    all_data,
    columns=[
        "Player_1",
        "Player_2",
        "Player_3",
        "Player_4",
        "Outcome_1",
        "Outcome_2",
        "Outcome_3",
        "Outcome_4",
        "Score",
    ],
)

Prints:

   Player_1 Player_2 Player_3 Player_4 Outcome_1 Outcome_2 Outcome_3 Outcome_4  Score
0        P1       P2       P3       P4       win       win       win       win     72
1        P1       P2       P3       P4       win       win       win      lose    104
2        P1       P2       P3       P4       win       win      lose       win     38
3        P1       P2       P3       P4       win       win      lose      lose    123
4        P1       P2       P3       P4       win      lose       win       win     87
5        P1       P2       P3       P4       win      lose       win      lose    119
6        P1       P2       P3       P4       win      lose      lose       win     53
7        P1       P2       P3       P4       win      lose      lose      lose    138
8        P1       P2       P3       P4      lose       win       win       win     54
9        P1       P2       P3       P4      lose       win       win      lose     86
10       P1       P2       P3       P4      lose       win      lose       win     20
11       P1       P2       P3       P4      lose       win      lose      lose    105
12       P1       P2       P3       P4      lose      lose       win       win    120
13       P1       P2       P3       P4      lose      lose       win      lose    152
14       P1       P2       P3       P4      lose      lose      lose       win     86
15       P1       P2       P3       P4      lose      lose      lose      lose    171

How to get specific unique combinations of a dataframe using only dataframe operations?

Tags:

python

pandas

dataframe

Cem Koçak

1 Answers

Andrej Kesely

Recent Activity

Donate For Us

How to get specific unique combinations of a dataframe using only dataframe operations?

Tags:

python

pandas

dataframe

Cem Koçak

1 Answers

Andrej Kesely

Related questions

Recent Activity

Donate For Us