Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to pick a random element from an array that matches a certain criteria in O(1)

I have a list of recipes obtained from a database that looks like this:

List<RecipeNode> _recipeList;

RecipeNode, among other things, has a property that references one or more tags (Such as Dinner, Breakfast, Side, Vegetarian, Holiday, and about 60 others).

   public sealed class RecipeNode
   {
      public Guid RecipeId;
      public Byte[] Tags; //Tags such as 1, 5, 6, 8, 43
      //... More stuff
   }

Finding a random recipe from _recipeList in O(1) would of course be easy, however what I need to do is find a random recipe that has, say, 5 in the Tags in O(1).

Right now, my only idea is to make an array of List<RecipeNodes>, keyed by tag. For example:

List<RecipeNode>[] _recipeListByTag;

Then, _recipeListByTag[5] would contain a list of all the recipes that have a 5 in the Tags array. I could then choose a random allowed tag and a random recipe within that tag in O(1).

The drawback of this approach is the size of this multidimensional array would be Recipes * Tags (eg, the sum of Tags.length across all recipes), which starts to take up a lot of memory since I'm storing a potentially huge number of recipes in this array. Of course, since RecipeNode is a reference type, I'm only repeating the 4byte pointers to the recipes, so this still might be the best way to go.

Is there a more efficient data structure or algorithm I could use to allow me to find a random recipe that contains a certain allowed tag? Thanks!

like image 383
Mike Christensen Avatar asked Jan 28 '12 21:01

Mike Christensen


People also ask

How do I randomly select an element from an array in Python?

Use the numpy. random. choice() function to generate the random choices and samples from a NumPy multidimensional array. Using this function we can get single or multiple random numbers from the n-dimensional array with or without replacement.


3 Answers

List<RecipeNode>[] _recipeListByTag is probably the best approach for you, and its size is not Recipes * Tags because each list in the array will only contain as many recipes as match a tag, and not more. Its size would become Recipes * Tags if every single recipe contained every single tag.

If the amount of memory occupied by your data structures is so very dear to you, do not forget to call List.TrimExcess() after you have populated each list.

like image 183
Mike Nakis Avatar answered Oct 16 '22 15:10

Mike Nakis


Is this homework? I doubt any real-world recipe program would require O(1) access to tags, and be too slow for using a database. I also doubt any real-world recipe would have numeric tags. Understanding the real domain can help provide a better answer.

However, if it really is about recipes and numeric tags, and if you only have 256 tags, why don't you just choose a random recipe 1 million times? The odds of not finding a recipe with the required tag are minimal, and the complexity is still O(1). If you don't like the odds, choose a random recipe 10^20 times. The complexity is still O(1).

UPDATE:

Since it's not the O(1) you're worried about, but rather the time it takes to pick a random recipe, I suggest you let your database handle this for you - the same database that holds all the recipes anyway, and the same database you're going to access to show the random recipe.

You can SELECT a random record in SQL Server this way: SQL Server Random Sort . If you're using some other database, there are other ways: http://www.petefreitag.com/item/466.cfm . Just make sure your WHERE clause has Tag=17 in it.

like image 24
zmbq Avatar answered Oct 16 '22 15:10

zmbq


If you want to keep the data in memory, you won't do much better than a list of (4 byte) pointers for each tag. If you can use a DB... well, others have already posted about that. Depending on how huge is "huge", you might just fork out some $$$ to add RAM to the target machine.

If you do want to keep the data in memory, but want to be ridiculously parsimonious with memory, you could try to squeeze down the 4 bytes per tag-recipe combination. For example, keep all the recipes in a big array, and (assuming you won't have more than about a million) store array indexes in 3 bytes each.

To go even further, you could divide the recipes with a given tag into equally-sized "buckets" (each occupying a contiguous area of the big array), store a starting index for each "bucket" (in 3-4 bytes), and then store a list of delta values between indexes of consecutive recipes with the given tag. Encode the delta values using an array of bytes, in such a way that a single delta value can use anything from 1-4 bytes, as required.

Because the number of recipes in a "bucket" will be limited to a constant number, retrieval using this approach is still O(1).

(I have done embedded programming on micros with as little as 256 bytes of RAM... when you do that you start thinking of very creative ways to save bytes or even bits. I'm sure that going to such lengths will not be necessary in your application, but I thought this was an interesting idea.)

like image 31
Alex D Avatar answered Oct 16 '22 13:10

Alex D