Use DataFrame. To convert pandas DataFrame to Dictionary object, use to_dict() method, this takes orient as dict by default which returns the DataFrame in format {column -> {index -> value}} . When no orient is specified, to_dict() returns in this format.
Method 1: Create DataFrame from Dictionary using default Constructor of pandas. Dataframe class. Method 2: Create DataFrame from Dictionary with user-defined indexes. Method 3: Create DataFrame from simple dictionary i.e dictionary with key and simple value like integer or string value.
Use the tolist() Method to Convert a Dataframe Column to a List. A column in the Pandas dataframe is a Pandas Series . So if we need to convert a column to a list, we can use the tolist() method in the Series . tolist() converts the Series of pandas data-frame to a list.
Using the pandas. To make a series from a dictionary, simply pass the dictionary to the command pandas. Series method. The keys of the dictionary form the index values of the series and the values of the dictionary form the values of the series.
In [9]: pd.Series(df.Letter.values,index=df.Position).to_dict()
Out[9]: {1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e'}
Speed comparion (using Wouter's method)
In [6]: df = pd.DataFrame(randint(0,10,10000).reshape(5000,2),columns=list('AB'))
In [7]: %timeit dict(zip(df.A,df.B))
1000 loops, best of 3: 1.27 ms per loop
In [8]: %timeit pd.Series(df.A.values,index=df.B).to_dict()
1000 loops, best of 3: 987 us per loop
I found a faster way to solve the problem, at least on realistically large datasets using:
df.set_index(KEY).to_dict()[VALUE]
Proof on 50,000 rows:
df = pd.DataFrame(np.random.randint(32, 120, 100000).reshape(50000,2),columns=list('AB'))
df['A'] = df['A'].apply(chr)
%timeit dict(zip(df.A,df.B))
%timeit pd.Series(df.A.values,index=df.B).to_dict()
%timeit df.set_index('A').to_dict()['B']
Output:
100 loops, best of 3: 7.04 ms per loop # WouterOvermeire
100 loops, best of 3: 9.83 ms per loop # Jeff
100 loops, best of 3: 4.28 ms per loop # Kikohs (me)
In Python 3.6 the fastest way is still the WouterOvermeire one. Kikohs' proposal is slower than the other two options.
import timeit
setup = '''
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(32, 120, 100000).reshape(50000,2),columns=list('AB'))
df['A'] = df['A'].apply(chr)
'''
timeit.Timer('dict(zip(df.A,df.B))', setup=setup).repeat(7,500)
timeit.Timer('pd.Series(df.A.values,index=df.B).to_dict()', setup=setup).repeat(7,500)
timeit.Timer('df.set_index("A").to_dict()["B"]', setup=setup).repeat(7,500)
Results:
1.1214002349999777 s # WouterOvermeire
1.1922008498571748 s # Jeff
1.7034366211428602 s # Kikohs
>>> import pandas as pd
>>> df = pd.DataFrame({'Position':[1,2,3,4,5], 'Letter':['a', 'b', 'c', 'd', 'e']})
>>> dict(sorted(df.values.tolist())) # Sort of sorted...
{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
>>> from collections import OrderedDict
>>> OrderedDict(df.values.tolist())
OrderedDict([('a', 1), ('b', 2), ('c', 3), ('d', 4), ('e', 5)])
Explaining solution: dict(sorted(df.values.tolist()))
Given:
df = pd.DataFrame({'Position':[1,2,3,4,5], 'Letter':['a', 'b', 'c', 'd', 'e']})
[out]:
Letter Position
0 a 1
1 b 2
2 c 3
3 d 4
4 e 5
Try:
# Get the values out to a 2-D numpy array,
df.values
[out]:
array([['a', 1],
['b', 2],
['c', 3],
['d', 4],
['e', 5]], dtype=object)
Then optionally:
# Dump it into a list so that you can sort it using `sorted()`
sorted(df.values.tolist()) # Sort by key
Or:
# Sort by value:
from operator import itemgetter
sorted(df.values.tolist(), key=itemgetter(1))
[out]:
[['a', 1], ['b', 2], ['c', 3], ['d', 4], ['e', 5]]
Lastly, cast the list of list of 2 elements into a dict.
dict(sorted(df.values.tolist()))
[out]:
{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
Answering @sbradbio comment:
If there are multiple values for a specific key and you would like to keep all of them, it's the not the most efficient but the most intuitive way is:
from collections import defaultdict
import pandas as pd
multivalue_dict = defaultdict(list)
df = pd.DataFrame({'Position':[1,2,4,4,4], 'Letter':['a', 'b', 'd', 'e', 'f']})
for idx,row in df.iterrows():
multivalue_dict[row['Position']].append(row['Letter'])
[out]:
>>> print(multivalue_dict)
defaultdict(list, {1: ['a'], 2: ['b'], 4: ['d', 'e', 'f']})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With