Adding tuple elements, parsed into pandas DataFrame

Question

I have several Python lists of tuples:

[(0, 61), (1, 30), (5, 198), (4, 61), (0, 30), (5, 200)]
[(1, 72), (2, 19), (3, 31), (4, 192), (6, 72), (5, 75)]
[(3, 12), (0, 51)]
...

Each of these tuples are created such that these are in the format (key, value):

There are seven keys: 0, 1, 2, 3, 4, 5, 6

The intended output is a pandas DataFrame, whereby each column is named by the key:

import pandas as pd
print(df)

0    1    2    3    4    5    6 
91   30   0    0    61   198  0
0    72   19   31   192  75   72
51   0    0    12   0    0    0

Now, the problem I have conceptually is how to add several tuple "values" if they keys which are the same.

I can access these values for a given list, e.g.

mylist = [(0, 61), (1, 30), (5, 198), (4, 61), (0, 30), (5, 200)]
keys =  [x[0] for x in mylist]

and

print(keys)
[0, 1, 5, 4, 0, 5]

I'm not sure how to create, e.g. a dictionary of the key:value pairs, which I could load into a pandas DataFrame

piRSquared · Accepted Answer

Consider your data in a name tups

tups = [
    [(0, 61), (1, 30), (5, 198), (4, 61), (0, 30), (5, 200)],
    [(1, 72), (2, 19), (3, 31), (4, 192), (6, 72), (5, 75)],
    [(3, 12), (0, 51)]
]

Option 0
Using np.bincount and crazy maps and zips and splats
This works because np.bincounts first two arguments are the array of positions and the optional array of weights to use while adding.

pd.DataFrame(
    list(map(lambda t: np.bincount(*zip(*t)), tups))
).fillna(0, downcast='infer')

    0   1   2   3    4    5   6
0  91  30   0   0   61  398   0
1   0  72  19  31  192   75  72
2  51   0   0  12    0    0   0

Option 1
Using comprehensions and summation over axis levels.

pd.Series({
    (i, j, k): v
    for i, row in enumerate(tups)
    for k, (j, v) in enumerate(row)
}).sum(level=[0, 1]).unstack(fill_value=0)

    0   1   2   3    4    5   6
0  91  30   0   0   61  398   0
1   0  72  19  31  192   75  72
2  51   0   0  12    0    0   0

Option 2
You can use the DataFrame constructor on the result of using a defaultdict:

from collections import defaultdict

d = defaultdict(lambda: defaultdict(int))

for i, row in enumerate(tups):
    for j, v in row:
        d[j][i] += v

pd.DataFrame(d).fillna(0, downcast='infer')

    0   1   2   3    4    5   6
0  91  30   0   0   61  398   0
1   0  72  19  31  192   75  72
2  51   0   0  12    0    0   0

Option 3
Create a zero dataframe and update it via iteration

n, m = len(tups), max(j for row in tups for j, _ in row) + 1

df = pd.DataFrame(0, range(n), range(m))

for i, row in enumerate(tups):
    for j, v in row:
        df.at[i, j] += v

df

    0   1   2   3    4    5   6
0  91  30   0   0   61  398   0
1   0  72  19  31  192   75  72
2  51   0   0  12    0    0   0

Adding tuple elements, parsed into pandas DataFrame

Tags:

python

pandas

EB2127

1 Answers

piRSquared

Recent Activity

Donate For Us

Adding tuple elements, parsed into pandas DataFrame

Tags:

python

pandas

EB2127

1 Answers

piRSquared

Related questions

Recent Activity

Donate For Us