I have dictionary of list as follows (it can be more than 1M elements, also assume dictionary is sorted by key)
import scipy.sparse as sp
d = {0: [0,1], 1: [1,2,3],
2: [3,4,5], 3: [4,5,6],
4: [5,6,7], 5: [7],
6: [7,8,9]}
I want to know what is the most efficient way (fastest way for large dictionary) to convert it into list of row and column index like:
r_index = [0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 6, 6, 6]
c_index = [0, 1, 1, 2, 3, 3, 4, 5, 4, 5, 6, 5, 6, 7, 7, 7, 8, 9]
Here are some solutions that I have so far:
Using iteration
row_ind = [k for k, v in d.iteritems() for _ in range(len(v))] # or d.items() in Python 3
col_ind = [i for ids in d.values() for i in ids]
Using pandas library
import pandas as pd
df = pd.DataFrame.from_dict(d, orient='index')
df = df.stack().reset_index()
row_ind = list(df['level_0'])
col_ind = list(df[0])
Using itertools
import itertools
indices = [(x,y) for x, y in itertools.chain.from_iterable([itertools.product((k,), v) for k, v in d.items()])]
indices = np.array(indices)
row_ind = indices[:, 0]
col_ind = indices[:, 1]
I'm not sure which way is the fastest way to deal with this problem if I have a lot of elements in my dictionary. Thanks!
The first rule of thumb of optimization in python is, to make sure that your innermost loop is outsourced to some library function. This only applies for cpython - pypy is a completely different story. In your case using extend is giving some significant speedup.
import time
l = range(10000)
x = dict([(k, list(l)) for k in range(1000)])
def org(d):
row_ind = [k for k, v in d.items() for _ in range(len(v))]
col_ind = [i for ids in d.values() for i in ids]
def ext(d):
row_ind = [k for k, v in d.items() for _ in range(len(v))]
col_ind = []
for ids in d.values():
col_ind.extend(ids)
def ext_both(d):
row_ind = []
for k, v in d.items():
row_ind.extend([k] * len(v))
col_ind = []
for ids in d.values():
col_ind.extend(ids)
functions = [org, ext, ext_both]
for func in functions:
begin = time.time()
func(x)
elapsed = time.time() - begin
print(func.__name__ + ": " + str(elapsed))
Output when using python2:
org: 0.512559890747
ext: 0.340406894684
ext_both: 0.149670124054
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With