Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - What is the cheapest data type to be used as "dummy value" in dict

Tags:

python

swig

I would like to ask what is the cheapest data type (in term of memory consumption and cost to hold/process it) to be used as dummy value in python dict (only key of the dict matters to me, values are just placeholder)

For examples :

d1 = {1: None, 2: None, 3: None}
d2 = {1: -1, 2: -1, 3: -1}
d3 = {1: False, 2: False, 3: False}

Here only keys (1, 2, 3) are useful to me, the values are not so they can be any value (just used as a place holder. What I want to know is what dummy data I should use here. For now I use None, but not sure if it is the "cheapest" one.

P.S., I know the best option to store only keys may be to use Set instead of dict (with dummy values). However, the reason that I am doing so is because I want to exchange data between Python and C++ using SWIG. And for now I have figured out how to pass Python dict to C++ as std::map using SWIG, but cannot find anything about how to pass Python Set to C++ as std::set...

Helps / Guidances are very appreciated here!

like image 653
Yu Wang Avatar asked Sep 24 '18 15:09

Yu Wang


2 Answers

python 3.4 64bit:

>>> import sys
>>> sys.getsizeof(None)
16
>>> sys.getsizeof(False)
24
>>> sys.getsizeof(1)
28
>>> 

So None would appear to be the best choice (I've only listed immutable objects, and disregarded strings and tuples). Note that it doesn't matter much as those objects are usually cached, so the size isn't multiplied by the number of elements of your dictionary (furthermore None is guaranteed to be a singleton)

That said, the cost of the actual object is neglectable compared to the cost of storing a reference to that object for each key/value pair. If your dictionary holds 1000 values, you have 1000 references to store, whatever the size of the value.

Conclusion: it doesn't matter much as long as you're using the same reference everywhere, and it's going to cost much more than a set anyway because of the references being stored as the values of each dictionary entry.

One possible alternative would be to pass the set as json representation (in a list, then) as a pointer of characters to the C++ side, which will parse it using a good json parser. Unless your values are big floating point values (or huge integers), that would save memory all right because the object aspect is eliminated with the serialization.

>>> json.dumps(list(set(range(4,10))))
'[4, 5, 6, 7, 8, 9]'  # hard to beat that in terms of size!
like image 192
Jean-François Fabre Avatar answered Oct 26 '22 11:10

Jean-François Fabre


You can use a set, but SWIG seems to only support passing Python lists as the set parameter (or use the named template) without writing your own typemap. Example (Windows):

test.i*

%module test

%include <std_set.i>
%template(seti) std::set<int>;

%inline %{

#include <set>
#include <iostream>
void func(std::set<int> a)
{
    for(auto i : a)
        std::cout << i << std::endl;
}

%}

Output:

>>> import set
>>> s = test.seti([1,1,2,2,3,3])  # pass named template
>>> test.func(s)
1
2
3
>>> test.func([1,2,3,3,4,4])  # pass a list that converts to a set
1
2
3
4
>>> test.func({1,1,2,2,3})   # Actual set doesn't work.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: in method 'func', argument 1 of type 'std::set< int,std::less< int >,std::allocator< int > >'
like image 33
Mark Tolonen Avatar answered Oct 26 '22 11:10

Mark Tolonen