Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which data structures to use when storing multiple entities with multiple query criteria?

There is a storage unit, with has a capacity for N items. Initially this unit is empty. The space is arranged in a linear manner, i.e. one beside the other in a line. Each storage space has a number, increasing till N.

When someone drops their package, it is assigned the first available space. The packages could also be picked up, in this case the space becomes vacant. Example: If the total capacity was 4. and 1 and 2 are full the third person to come in will be assigned the space 3. If 1, 2 and 3 were full and the 2nd space becomes vacant, the next person to come will be assigned the space 2.

The packages they drop have 2 unique properties, assigned for immediate identification. First they are color coded based on their content and second they are assigned a unique identification number(UIN).

What we want is to query the system:

  1. When the input is color, show all the UIN associated with this color.
  2. When the input is color, show all the numbers where these packages are placed(storage space number).
  3. Show where an item with a given UIN is placed, i.e. storage space number.

I would like to know how which Data Structures to use for this case, so that the system works as efficiently as possible? And I am not given which of these operations os most frequent, which means I will have to optimise for all the cases.

Please take a note, even though the query process is not directly asking for storage space number, but when an item is removed from the store it is removed by querying from the storage space number.

like image 314
rd22 Avatar asked Jan 20 '17 02:01

rd22


People also ask

Which data structure can be used to store data of multiple types at once?

HashMap ah, the HashMap, the golden child of BigO complexity. In return for some memory overhead and an annoying capital letter 'M', it's an awesome data structure.

Which is the best data structure to use for different situations?

Arrays. An array is the simplest and most widely used data structure. Other data structures like stacks and queues are derived from arrays.


2 Answers

You have mentioned three queries that you want to make. Let's handle them one by one.

I cannot think of a single Data Structure that can help you with all three queries at the same time. So I'm going to give an answer that has three Data Structures and you will have to maintain all the three DS's state to keep the application running properly. Consider that as the cost of getting a respectably fast performance from your application for the desired functionality.


When the input is color, show all the UIN associated with this color.

Use a HashMap that maps Color to a Set of UIN. Whenever an item:

  • is added - See if the color is present in the HashMap. If yes, add this UIN to the set else create a new entry with a new set and add the UIN then.

  • is removed - Find the set for this color and remove this UIN from the set. If the set is now empty, you may remove this entry altogether.


When the input is color, show all the numbers where these packages are placed.

Maintain a HashMap that maps UIN to the number where an incoming package is placed. From the HashMap that we created in the previous case, you can get the list of all UINs associated with the given Color. Then using this HashMap you can get the number for each UIN which is present in the set for that Color.

So now, when a package is to be added, you will have to add the entry to previous HashMap in the specific Color bucket and to this HashMap as well. On removing, you will have to .Remove() the entry from here.


Finally,

Show where an item with a given UIN is placed.

If you have done the previous, you already have the HashMap mapping UINs to numbers. This problem is only a sub-problem of the previous one.


The third DS, as I mentioned at the top, will be a Min-Heap of ints. The heap will be initialized with the first N integers at the start. Then, as the packages will come, the heap will be polled. The number returned will represent the storage space where this package is to be put. If the storage unit is full, the heap will be empty. Whenever a package will be removed, its number will be added back to the heap. Since it is a min-heap, the minimum number will bubble up to the top, satisfying your case that when 4 and 2 are empty, the next space to be filled will be 4.


Let's do a Big O analysis of this solution for completion.

  • Time for initialization: of this setup will be O(N) because we will have to initialize a heap of N. The other two HashMaps will be empty to begin with and therefore will incur no time cost.

  • Time for adding a package: will include time to get a number and then make appropriate entries in the HashMaps. To get a number from heap will take O(Log N) time at max. Addition of entries in HashMaps will be O(1). Hence a worst case overall time of O(Log N).

  • Time for removing a package: will also be O(Log N) at worst because the time to remove from the HashMaps will be O(1) only while, the time to add the freed number back to min-heap will be upper bounded by O(Log N).

like image 56
displayName Avatar answered Nov 15 '22 14:11

displayName


This smells of homework or really bad management.

Either way, I have decided to do a version of this where you care most about query speed but don't care about memory or a little extra overhead to inserts and deletes. That's not to say that I think that I'm going to be burning memory like crazy or taking forever to insert and delete, just that I'm focusing most on queries.

Tl;DR - to solve your problem, I use a PriorityQueue, an Array, a HashMap, and an ArrayListMultimap (from guava, a common external library), each one to solve a different problem.

The following section is working code that walks through a few simple inserts, queries, and deletes. This next bit isn't actually Java, since I chopped out most of the imports, class declaration, etc. Also, it references another class called 'Packg'. That's just a simple data structure which you should be able to figure out just from the calls made to it.

Explanation is below the code

import com.google.common.collect.ArrayListMultimap;

private PriorityQueue<Integer> openSlots;
private Packg[] currentPackages;
Map<Long, Packg> currentPackageMap;
private ArrayListMultimap<String, Packg> currentColorMap;
private Object $outsideCall;


public CrazyDataStructure(int howManyPackagesPossible) {
    $outsideCall = new Object();
    this.currentPackages = new Packg[howManyPackagesPossible];
    openSlots = new PriorityQueue<>();
    IntStream.range(0, howManyPackagesPossible).forEach(i -> openSlots.add(i));//populate the open slots priority queue
    currentPackageMap = new HashMap<>();
    currentColorMap = ArrayListMultimap.create();
}

/*
 * args[0] = integer, maximum # of packages
 */
public static void main(String[] args)
{
    int howManyPackagesPossible = Integer.parseInt(args[0]);
    CrazyDataStructure cds = new CrazyDataStructure(howManyPackagesPossible);
    cds.addPackage(new Packg(12345, "blue"));
    cds.addPackage(new Packg(12346, "yellow"));
    cds.addPackage(new Packg(12347, "orange"));
    cds.addPackage(new Packg(12348, "blue"));
    System.out.println(cds.getSlotsForColor("blue"));//should be a list of {0,3}
    System.out.println(cds.getSlotForUIN(12346));//should be 1 (0-indexed, remember)
    System.out.println(cds.getSlotsForColor("orange"));//should be a list of {2}
    System.out.println(cds.removePackage(2));//should be the orange one
    cds.addPackage(new Packg(12349, "green"));
    System.out.println(cds.getSlotForUIN(12349));//should be 2, since that's open
}

public int addPackage(Packg packg)
{
    synchronized($outsideCall)
    {
        int result = openSlots.poll();
        packg.setSlot(result);
        currentPackages[result] = packg;
        currentPackageMap.put(packg.getUIN(), packg);
        currentColorMap.put(packg.getColor(), packg);
        return result;
    }
}

public Packg removePackage(int slot)
{
    synchronized($outsideCall)
    {
        if(currentPackages[slot] == null)
            return null;
        else
        {
            Packg packg = currentPackages[slot];
            currentColorMap.remove(packg.getColor(), packg);
            currentPackageMap.remove(packg.getUIN());
            currentPackages[slot] = null;
            openSlots.add(slot);//return slot to priority queue
            return packg;
        }
    }
}

public List<Packg> getUINsForColor(String color)
{
    synchronized($outsideCall)
    {
        return currentColorMap.get(color);
    }
}

public List<Integer> getSlotsForColor(String color)
{
    synchronized($outsideCall)
    {
        return currentColorMap.get(color).stream().map(packg -> packg.getSlot()).collect(Collectors.toList());
    }
}

public int getSlotForUIN(long uin)
{
    synchronized($outsideCall)
    {
        if(currentPackageMap.containsKey(uin))
            return currentPackageMap.get(uin).getSlot();
        else
            return -1;
    }
}

I use 4 different data structures in my class.

PriorityQueue I use the priority queue to keep track of all the open slots. It's log(n) for inserts and constant for removals, so that shouldn't be too bad. Memory-wise, it's not particularly efficient, but it's also linear, so that won't be too bad.

Array I use a regular Array to track by slot #. This is linear for memory, and constant for insert and delete. If you needed more flexibility in the number of slots you could have, you might have to switch this out for an ArrayList or something, but then you'd have to find a better way to keep track of 'empty' slots.

HashMap ah, the HashMap, the golden child of BigO complexity. In return for some memory overhead and an annoying capital letter 'M', it's an awesome data structure. Insertions are reasonable, and queries are constant. I use it to map between the UIDs and the slot for a Packg.

ArrayListMultimap the only data structure I use that's not plain Java. This one comes from Guava (Google, basically), and it's just a nice little shortcut to writing your own Map of Lists. Also, it plays nicely with nulls, and that's a bonus to me. This one is probably the least efficient of all the data structures, but it's also the one that handles the hardest task, so... can't blame it. this one allows us to grab the list of Packg's by color, in constant time relative to the number of slots and in linear time relative to the number of Packg objects it returns.

When you have this many data structures, it makes inserts and deletes a little cumbersome, but those methods should still be pretty straight-forward. If some parts of the code don't make sense, I'll be happy to explain more (by adding comments in the code), but I think it should be mostly fine as-is.

like image 30
Jeutnarg Avatar answered Nov 15 '22 16:11

Jeutnarg