Simple dictionary:
d = {'a': set([1,2,3]), 'b': set([3, 4])}
(the sets may be turned into lists if it matters)
How do I convert it into a long/tidy DataFrame in which each column is a variable and every observation is a row, i.e.:
  letter  value
0      a      1
1      a      2
2      a      3
3      b      3
4      b      4
The following works, but it's a bit cumbersome:
id = 0
tidy_d = {}
for l, vs in d.items():
    for v in vs:
        tidy_d[id] = {'letter': l, 'value': v}
        id += 1
pd.DataFrame.from_dict(tidy_d, orient = 'index')
Is there any pandas magic to do this? Something like:
pd.DataFrame([d]).T.reset_index(level=0).unnest()
where unnest obviously doesn't exist and comes from R.
You can use a comprehension with itertools.chain and zip:
from itertools import chain
keys, values = map(chain.from_iterable, zip(*((k*len(v), v) for k, v in d.items())))
df = pd.DataFrame({'letter': list(keys), 'value': list(values)})
print(df)
  letter  value
0      a      1
1      a      2
2      a      3
3      b      3
4      b      4
This can be rewritten in a more readable fashion:
zipper = zip(*((k*len(v), v) for k, v in d.items()))
values = map(list, map(chain.from_iterable, zipper))
df = pd.DataFrame(list(values), columns=['letter', 'value'])
Use numpy.repeat with chain.from_iterable:
from itertools import chain
df = pd.DataFrame({
    'letter' : np.repeat(list(d.keys()), [len(v) for k, v in d.items()]),
    'value' : list(chain.from_iterable(d.values())), 
})
print (df)
  letter  value
0      a      1
1      a      2
2      a      3
3      b      3
4      b      4
A tad more "pandaic", inspired by this post:
pd.DataFrame.from_dict(d, orient = 'index') \
  .rename_axis('letter').reset_index() \
  .melt(id_vars = ['letter'], value_name = 'value') \
  .drop('variable', axis = 1) \
  .dropna()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With