Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding tuple elements, parsed into pandas DataFrame

Tags:

python

pandas

I have several Python lists of tuples:

[(0, 61), (1, 30), (5, 198), (4, 61), (0, 30), (5, 200)]
[(1, 72), (2, 19), (3, 31), (4, 192), (6, 72), (5, 75)]
[(3, 12), (0, 51)]
...

Each of these tuples are created such that these are in the format (key, value):

There are seven keys: 0, 1, 2, 3, 4, 5, 6

The intended output is a pandas DataFrame, whereby each column is named by the key:

import pandas as pd
print(df)

0    1    2    3    4    5    6 
91   30   0    0    61   198  0
0    72   19   31   192  75   72
51   0    0    12   0    0    0

Now, the problem I have conceptually is how to add several tuple "values" if they keys which are the same.

I can access these values for a given list, e.g.

mylist = [(0, 61), (1, 30), (5, 198), (4, 61), (0, 30), (5, 200)]
keys =  [x[0] for x in mylist]

and

print(keys)
[0, 1, 5, 4, 0, 5]

I'm not sure how to create, e.g. a dictionary of the key:value pairs, which I could load into a pandas DataFrame

like image 766
EB2127 Avatar asked Mar 11 '26 17:03

EB2127


1 Answers

Consider your data in a name tups

tups = [
    [(0, 61), (1, 30), (5, 198), (4, 61), (0, 30), (5, 200)],
    [(1, 72), (2, 19), (3, 31), (4, 192), (6, 72), (5, 75)],
    [(3, 12), (0, 51)]
]

Option 0
Using np.bincount and crazy maps and zips and splats
This works because np.bincounts first two arguments are the array of positions and the optional array of weights to use while adding.

pd.DataFrame(
    list(map(lambda t: np.bincount(*zip(*t)), tups))
).fillna(0, downcast='infer')

    0   1   2   3    4    5   6
0  91  30   0   0   61  398   0
1   0  72  19  31  192   75  72
2  51   0   0  12    0    0   0

Option 1
Using comprehensions and summation over axis levels.

pd.Series({
    (i, j, k): v
    for i, row in enumerate(tups)
    for k, (j, v) in enumerate(row)
}).sum(level=[0, 1]).unstack(fill_value=0)

    0   1   2   3    4    5   6
0  91  30   0   0   61  398   0
1   0  72  19  31  192   75  72
2  51   0   0  12    0    0   0

Option 2
You can use the DataFrame constructor on the result of using a defaultdict:

from collections import defaultdict

d = defaultdict(lambda: defaultdict(int))

for i, row in enumerate(tups):
    for j, v in row:
        d[j][i] += v

pd.DataFrame(d).fillna(0, downcast='infer')

    0   1   2   3    4    5   6
0  91  30   0   0   61  398   0
1   0  72  19  31  192   75  72
2  51   0   0  12    0    0   0

Option 3
Create a zero dataframe and update it via iteration

n, m = len(tups), max(j for row in tups for j, _ in row) + 1

df = pd.DataFrame(0, range(n), range(m))

for i, row in enumerate(tups):
    for j, v in row:
        df.at[i, j] += v

df

    0   1   2   3    4    5   6
0  91  30   0   0   61  398   0
1   0  72  19  31  192   75  72
2  51   0   0  12    0    0   0
like image 155
piRSquared Avatar answered Mar 14 '26 08:03

piRSquared



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!