I have a pandas DataFrame with one list-column, say:
df = pd.DataFrame({"pairs": [["A|B", "B|C", "C|D", "D|F"], ["A|D", "D|F", "F|G", "G|D"], ["C|D", "D|X"]]})
The lists in the pairs column always contain consecutive pairs, where the 2 elements of the pairs are separated by a |. I want to kind of "flatten" the lists in these columns, by instead of storing the pairs, now storing the elements of the pairs in the same order. So the desired DataFrame looks like:
elements
[A, B, C, D, F]
[A, D, F, G, D]
[C, D, X]
(Edited: I also would like to have elements occur multiple times in a resulting list, like D in the second row)
This looks so simple and I can't believe there wouldn't be an efficient solution to this, but so far I have failed to find any python method that could help me
You can use set-comprehension:
df["elements"] = df["pairs"].apply(
    lambda x: {ww for w in x for ww in w.split("|")}
)
print(df)
Prints:
                  pairs         elements
0  [A|B, B|C, C|D, D|F]  {B, C, D, A, F}
1       [A|D, D|F, F|G]     {G, D, F, A}
2            [C|D, D|X]        {X, C, D}
If you want lists:
df["elements"] = df["pairs"].apply(
    lambda x: list({ww for w in x for ww in w.split("|")})
)
print(df)
                  pairs         elements
0  [A|B, B|C, C|D, D|F]  [D, F, A, C, B]
1       [A|D, D|F, F|G]     [G, D, A, F]
2            [C|D, D|X]        [X, D, C]
EDIT: To mantain order:
def fn(x):
    seen = set()
    out = []
    for v in x:
        for w in v.split("|"):
            if not w in seen:
                seen.add(w)
                out.append(w)
    return out
df["elements"] = df["pairs"].apply(fn)
print(df)
Prints:
                  pairs         elements
0  [A|B, B|C, C|D, D|F]  [A, B, C, D, F]
1  [A|D, D|F, F|G, G|D]     [A, D, F, G]
2            [C|D, D|X]        [C, D, X]
EDIT: To keep multiple elements + order:
from itertools import groupby, chain
def fn(x):
    return [v for v, _ in groupby(chain.from_iterable(v.split("|") for v in x))]
df["elements"] = df["pairs"].apply(fn)
print(df)
Prints:
                  pairs         elements
0  [A|B, B|C, C|D, D|F]  [A, B, C, D, F]
1  [A|D, D|F, F|G, G|D]  [A, D, F, G, D]
2            [C|D, D|X]        [C, D, X]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With