I need a faster way to store and access around 3GB of <code>k:v</code> pairs. Where <code>k</code> is a string or an integer and <code>v</code> is an <code>np.array()</code> that can be of different shapes. Is there any object that is faster than the standard python dict in storing and accessing such a table? For example, a <code>pandas.DataFrame</code>? As far I have understood, python dict is a quite fast implementation of a hashtable. Is there anything better than that for my specific case?

No, I don't think there is anything faster than <code>dict</code>. The time complexity of its index checking is <code>O(1)</code>. <pre class="prettyprint"><code>------------------------------------------------------- Operation | Average Case | Amortized Worst Case | ------------------------------------------------------- Copy[2] | O(n) | O(n) | Get Item | O(1) | O(n) | Set Item[1] | O(1) | O(n) | Delete Item | O(1) | O(n) | Iteration[2] | O(n) | O(n) | ------------------------------------------------------- </code></pre> PS https://wiki.python.org/moin/TimeComplexity

Is there anything faster than dict()?

2 Answers

No, there is nothing faster than a dictionary for this task and that’s because the complexity of its indexing (getting and setting item) and even membership checking is O(1) in average. (check the complexity of the rest of functionalities on Python doc https://wiki.python.org/moin/TimeComplexity )

Once you saved your items in a dictionary, you can have access to them in constant time which means that it's unlikely for your performance problem to have anything to do with dictionary indexing. That being said, you still might be able to make this process slightly faster by making some changes in your objects and their types that may result in some optimizations at under the hood operations.

e.g. If your strings (keys) are not very large you can intern the lookup key and your dictionary's keys. Interning is caching the objects in memory --or as in Python, table of "interned" strings-- rather than creating them as a separate object.

Python has provided an intern() function within the sys module that you can use for this.

Enter string in the table of “interned” strings and return the interned string – which is string itself or a copy. Interning strings is useful to gain a little performance on dictionary lookup...

also ...

If the keys in a dictionary are interned and the lookup key is interned, the key comparisons (after hashing) can be done by a pointer comparison instead of comparing the string values themselves which in consequence reduces the access time to the object.

Here is an example:

In [49]: d = {'mystr{}'.format(i): i for i in range(30)}

In [50]: %timeit d['mystr25']
10000000 loops, best of 3: 46.9 ns per loop

In [51]: d = {sys.intern('mystr{}'.format(i)): i for i in range(30)}

In [52]: %timeit d['mystr25']
10000000 loops, best of 3: 38.8 ns per loop

192

answered Oct 17 '22 15:10

Mazdak

No, I don't think there is anything faster than dict. The time complexity of its index checking is O(1).

-------------------------------------------------------
Operation    |  Average Case  | Amortized Worst Case  |
-------------------------------------------------------
Copy[2]      |    O(n)        |       O(n)            | 
Get Item     |    O(1)        |       O(n)            | 
Set Item[1]  |    O(1)        |       O(n)            | 
Delete Item  |    O(1)        |       O(n)            | 
Iteration[2] |    O(n)        |       O(n)            | 
-------------------------------------------------------

PS https://wiki.python.org/moin/TimeComplexity

answered Oct 17 '22 15:10

akash karothiya

Related questions
                            
                                Can't find msguniq. Make sure you have GNU gettext tools 0.15 or newer installed. (Django 1.8 and OSX ElCapitan)
                            
                                Django template filters, tags, simple_tags, and inclusion_tags
                            
                                moment.calendar() without the time
                            
                                How can I use cumsum within a group in Pandas?
                            
                                vim and python scripts debugging
                            
                                Simple IPC between C++ and Python (cross platform)
                            
                                Using Python Iterparse For Large XML Files
                            
                                Python unit test that uses an external data file
                            
                                UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually
                            
                                Help me understand the difference between CLOBs and BLOBs in Oracle
                            
                                Generating non-repeating random numbers in Python
                            
                                Maximum size of "TEXT" datatype in postgresql
                            
                                Flask view return error "View function did not return a response"
                            
                                Apply function on each row (row-wise) of a NumPy array
                            
                                What is a maximum number of arguments in a Python function?
                            
                                Python Weather API [closed]
                            
                                How do I use cx_freeze?
                            
                                Python super() arguments: why not super(obj)?
                            
                                How to use str.contains() with multiple expressions, in pandas dataframes?
                            
                                Pivot String column on Pyspark Dataframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is there anything faster than dict()?

Tags:

python

dictionary

python-3.x

python-internals

numpy

alec_djinn

People also ask

2 Answers

Mazdak

akash karothiya

Recent Activity

Donate For Us