I am porting some C++ code to Python and one of the data structures is a multiset, but I am not sure how to model this in Python.
Let ms
be the C++ multiset<int>
How ms
is used (posting some examples)
multiset<int>::iterator it = ms.find(x) ms.erase(it) ms.insert(x) ms.end() ms.lower_bound(x) ms.clear()
Multiset package is similar to the Python set but it allows elements to occur multiple times. Implementation can be based on dictionary elements( It internally uses a dict for storage) to their multiplicity in the multisets.
This package provides a multiset implementation for Python. A multiset is similar to the builtin set , but it allows an element to occur multiple times. It is an unordered collection of element which have to be hashable just like in a set .
Multisets are a type of associative containers similar to the set, with the exception that multiple elements can have the same values.
In case of Set, data is stored in sorted order. In case of MultiSet also the data is stored in sorted order.
There isn't. See Python's standard library - is there a module for balanced binary tree? for a general discussion of the equivalents of C++ tree containers (map
, set
, multimap
, multiset
) in Python.
The closest I can think of is to use a dictionary mapping integers to counts (also integers). However this doesn't get you the keys in order, so you can't search using lower_bound
. An alternative is a to use an ordered list, as suggested by others already, maybe a list of (integer, count) tuples? If you only need to search after you've done all your insertions, you could use the dictionary as a temporary structure for construction, build the list after you've done all the insertions, then use the list for searching.
There are a couple implementations of sorted list data types which would fit your criteria. Two popular choices are SortedContainers and blist modules. Each of these modules provides a SortedList data type which automatically maintains the elements in sorted order and would allow for fast insertion and lower/upper bound lookups. There's a performance comparison that's helpful too.
The equivalent code using the SortedList type from the SortedContainers module would be:
from sortedcontainers import SortedList sl = SortedList() # Start index of `x` values start = sl.bisect_left(x) # End index of `x` values end = sl.bisect_right(x) # Iterator for those values iter(sl[start:end]) # Erase an element del sl[start:end] # Insert an element sl.add(x) # Iterate from lower bound start = sl.bisect_left(x) iter(sl[x] for x in range(start, len(sl))) # Clear elements sl.clear()
All of those operations should work efficiently on a sorted list data type.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With