I have several Python lists of tuples:
[(0, 61), (1, 30), (5, 198), (4, 61), (0, 30), (5, 200)]
[(1, 72), (2, 19), (3, 31), (4, 192), (6, 72), (5, 75)]
[(3, 12), (0, 51)]
...
Each of these tuples are created such that these are in the format (key, value):
There are seven keys: 0, 1, 2, 3, 4, 5, 6
The intended output is a pandas DataFrame, whereby each column is named by the key:
import pandas as pd
print(df)
0 1 2 3 4 5 6
91 30 0 0 61 198 0
0 72 19 31 192 75 72
51 0 0 12 0 0 0
Now, the problem I have conceptually is how to add several tuple "values" if they keys which are the same.
I can access these values for a given list, e.g.
mylist = [(0, 61), (1, 30), (5, 198), (4, 61), (0, 30), (5, 200)]
keys = [x[0] for x in mylist]
and
print(keys)
[0, 1, 5, 4, 0, 5]
I'm not sure how to create, e.g. a dictionary of the key:value pairs, which I could load into a pandas DataFrame
Consider your data in a name tups
tups = [
[(0, 61), (1, 30), (5, 198), (4, 61), (0, 30), (5, 200)],
[(1, 72), (2, 19), (3, 31), (4, 192), (6, 72), (5, 75)],
[(3, 12), (0, 51)]
]
Option 0
Using np.bincount and crazy maps and zips and splats
This works because np.bincounts first two arguments are the array of positions and the optional array of weights to use while adding.
pd.DataFrame(
list(map(lambda t: np.bincount(*zip(*t)), tups))
).fillna(0, downcast='infer')
0 1 2 3 4 5 6
0 91 30 0 0 61 398 0
1 0 72 19 31 192 75 72
2 51 0 0 12 0 0 0
Option 1
Using comprehensions and summation over axis levels.
pd.Series({
(i, j, k): v
for i, row in enumerate(tups)
for k, (j, v) in enumerate(row)
}).sum(level=[0, 1]).unstack(fill_value=0)
0 1 2 3 4 5 6
0 91 30 0 0 61 398 0
1 0 72 19 31 192 75 72
2 51 0 0 12 0 0 0
Option 2
You can use the DataFrame constructor on the result of using a defaultdict:
from collections import defaultdict
d = defaultdict(lambda: defaultdict(int))
for i, row in enumerate(tups):
for j, v in row:
d[j][i] += v
pd.DataFrame(d).fillna(0, downcast='infer')
0 1 2 3 4 5 6
0 91 30 0 0 61 398 0
1 0 72 19 31 192 75 72
2 51 0 0 12 0 0 0
Option 3
Create a zero dataframe and update it via iteration
n, m = len(tups), max(j for row in tups for j, _ in row) + 1
df = pd.DataFrame(0, range(n), range(m))
for i, row in enumerate(tups):
for j, v in row:
df.at[i, j] += v
df
0 1 2 3 4 5 6
0 91 30 0 0 61 398 0
1 0 72 19 31 192 75 72
2 51 0 0 12 0 0 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With