I have a dataframe like this
df = pd.DataFrame(columns = ['A', 'B'])
df.A = [1,1,1,2,2,2,2,4,4,5]
df.B = [5,2,4,3,1,5,4,1,2,2]
What I'm currently using
d = {}
for i in df.A:
d[i] = []
for v in df.A[df.A == i].index:
d[i].append(df.B[v])
Resulting in
{1: [5, 2, 4], 2: [3, 1, 5, 4], 4: [1, 2], 5: [2]}
But it's slow.
What is a pythonic way of doing this?
EDIT:
d = {}
for i in df.A.unique():
d[i] = df[df.A == i].B.tolist()
Still seems like there must be a faster way
Thanks for any help!
You can use a DataFrame's groupby
and to_dict
methods which'll keep all the heavy work done in pandas, and not Python loops, eg:
import pandas as pd
df = pd.DataFrame(columns = ['A', 'B'])
df.A = [1,1,1,2,2,2,2,4,4,5]
df.B = [5,2,4,3,1,5,4,1,2,2]
d = df.groupby('A')['B'].apply(list).to_dict()
Gives you:
{1: [5, 2, 4], 2: [3, 1, 5, 4], 4: [1, 2], 5: [2]}
look ad this: list to dictionary conversion with multiple values per key?
from collections import defaultdict
d = defaultdict(list)
for i, j in zip(df.A,df.B):
d[i].append(j)
if this ok?
EDIT: If you want, you can convert it to simple dict:
d = dict(d)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With