Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Searching for the first free index

I have a big array / list of 1 million id and then I need to find the first free id that can be used . It can be assumed that there are couple modules which refer to this data structure and take an id ( during which it shall be marked as used ) and then return it later ( shall be marked as free ). I want to know what different data structures can be used ? and what algorithm I can use to do this efficiently time and space (seperately). Please excuse if its already present here, I did search before posting .

like image 692
Rishabh Puri Avatar asked Dec 08 '22 14:12

Rishabh Puri


1 Answers

One initial idea that might work would be to store a priority queue of all the unused IDs, sorted so that low IDs are dequeued before high IDs. Using a standard binary heap, this would make it possible to return an ID to the unused ID pool in O(log n) time and to find the next free ID in O(log n) time as well. This has the disadvantage that it requires you to explicitly store all of the IDs, which could be space-inefficient if there are a huge number of IDs.

One potential space-saving optimization would be to try to coalesce consecutive ID values into ID ranges. For example, if you have free IDs 1, 3, 4, 5, 6, 8, 9, 10, and 12, you could just store the ranges 1, 3-6, 8-10, and 12. This would require you to change the underlying data structure a bit. Rather than using a binary heap, you could use a balanced binary search tree which stores the ranges. Since these ranges won't overlap, you can compare the ranges as less than, equal to, or greater than other ranges. Since BSTs are stored in sorted order, you can find the first free ID by taking the minimum element of the tree (in O(log n) time) and looking at the low end of its range. You would then update the range to exclude that first element, which might require you to remove an empty range from the the tree. When returning an ID to the pool of unused IDs, you could do a predecessor and successor search to determine the ranges that come immediately before and after the ID. If either one of them could be extended to include that ID, you can just extend the range. (You might need to merge two ranges as well). This also only takes O(log n) time.

Hope this helps!

like image 109
templatetypedef Avatar answered Dec 26 '22 01:12

templatetypedef