I have a linked list of objects each containing a 32-bit integer (and provably fewer than 232 such objects) and I want to efficiently choose an integer that's not present in the list, without using any additional storage (so copying them to an array, sorting the array, and choosing the minimum value not in the array would not be an option). However, the definition of the structure for list elements is under my control, so I could add (within reason) additional storage to each element as part of solving the problem. For example, I could add an extra set of prev/next pointers and merge-sort the list. Is this the best solution? Or is there a simpler or more efficient way to do it?
Example. # Given List Alist = ['Mon','Tue','Wed','Mon'] print("The given list : ",Alist) # Compare length for unique elements if(len(set(Alist)) == len(Alist)): print("All elements are unique. ") else: print("All elements are not unique. ")
unique() function. The unique() function is used to find the unique elements of an array. Returns the sorted unique elements of an array.
Using sort function() Calculate the length of an array using the length() function that will return an integer value as per the elements in an array. Call the sort function and pass the array and the size of an array as a parameter. Take a temporary variable that will store the count of distinct elements.
Given the conditions that you outline in the comments, especially your expectation of many identical values, you must expect a sparse distribution of used values.
Consequently, it might actually be best to just guess a value randomly and then check whether it coincides with a value in the list. Even if half the available value range were used (which seems extremely unlikely from your comments), you would only traverse the list twice on average. And you can drastically decrease this factor by simultaneously checking a number of guesses in one pass. Done correctly, the factor should always be close to one.
The advantage of such a probabilistic approach is that you are immune to bad sequences of values. Such sequences are always possible with range based approaches: If you calculate the min and max of the data, you run the risk, that the data contains both 0
and 2^32-1
. If you sequentially subdivide an interval, you run the risk of always getting values in the middle of the interval, which can shrink it to zero in 32 steps. With a probabilistic approach, these sequences can't hurt you.
I think, I would use something like four guesses for very small lists, and crank it up to roughly 16 as the size of the list approaches the limit. The high starting value is due to the fact that any such algorithm will be memory bound, i. e. your CPU has ample amounts of time to check a value while it waits for the next values to arrive from memory, so you better make good use of that time to reduce the number of passes required.
A further optimization would instantly replace a busted guess with a new one and keep track of where the replacement happened, so that you can avoid a complete second pass through the data. Also, move the busted guess to the end of the list of guesses, so that you only need to check against the start position of the first guess in your loop to stop as early as possible.
If you can spare one pointer in each object, you get an O(n)
worst-case algorithm easily (standard divide-and-conquer):
Example code using two sub-ranges per iteration:
unsigned getunusedid(element* h) {
unsigned start = 0, stop = -1;
for(;h;h = h->mainnext)
h->next = h->mainnext;
while(h) {
element *l = 0, *r = 0;
unsigned cl = 0, cr = 0;
unsigned mid = start + (stop - start) / 2;
while(h) {
element* next = h->next;
if(h->id < mid) {
h->next = l;
cl++;
l = h;
} else {
h->next = r;
cr++;
r = h;
}
h = next;
}
if(cl < cr) {
h = l;
stop = mid - 1;
} else {
h = r;
start = mid;
}
}
return start;
}
Some more remarks:
Beware of bugs in the above code; I have only proved it correct, not tried it.
O(n*lg n)
.An alternative would be using 2+ extra-pointers per element to maintain a balanced tree. That would speed up id-search, at the expense of some memory and insertion/removal time overhead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With