I have a list of recipes obtained from a database that looks like this:
List<RecipeNode> _recipeList;
RecipeNode
, among other things, has a property that references one or more tags (Such as Dinner, Breakfast, Side, Vegetarian, Holiday, and about 60 others).
public sealed class RecipeNode
{
public Guid RecipeId;
public Byte[] Tags; //Tags such as 1, 5, 6, 8, 43
//... More stuff
}
Finding a random recipe from _recipeList
in O(1) would of course be easy, however what I need to do is find a random recipe that has, say, 5 in the Tags
in O(1).
Right now, my only idea is to make an array of List<RecipeNodes>
, keyed by tag. For example:
List<RecipeNode>[] _recipeListByTag;
Then, _recipeListByTag[5]
would contain a list of all the recipes that have a 5 in the Tags
array. I could then choose a random allowed tag and a random recipe within that tag in O(1).
The drawback of this approach is the size of this multidimensional array would be Recipes * Tags
(eg, the sum of Tags.length across all recipes), which starts to take up a lot of memory since I'm storing a potentially huge number of recipes in this array. Of course, since RecipeNode
is a reference type, I'm only repeating the 4byte pointers to the recipes, so this still might be the best way to go.
Is there a more efficient data structure or algorithm I could use to allow me to find a random recipe that contains a certain allowed tag? Thanks!
Use the numpy. random. choice() function to generate the random choices and samples from a NumPy multidimensional array. Using this function we can get single or multiple random numbers from the n-dimensional array with or without replacement.
List<RecipeNode>[] _recipeListByTag
is probably the best approach for you, and its size is not Recipes * Tags
because each list in the array will only contain as many recipes as match a tag, and not more. Its size would become Recipes * Tags
if every single recipe contained every single tag.
If the amount of memory occupied by your data structures is so very dear to you, do not forget to call List.TrimExcess() after you have populated each list.
Is this homework? I doubt any real-world recipe program would require O(1) access to tags, and be too slow for using a database. I also doubt any real-world recipe would have numeric tags. Understanding the real domain can help provide a better answer.
However, if it really is about recipes and numeric tags, and if you only have 256 tags, why don't you just choose a random recipe 1 million times? The odds of not finding a recipe with the required tag are minimal, and the complexity is still O(1). If you don't like the odds, choose a random recipe 10^20 times. The complexity is still O(1).
UPDATE:
Since it's not the O(1) you're worried about, but rather the time it takes to pick a random recipe, I suggest you let your database handle this for you - the same database that holds all the recipes anyway, and the same database you're going to access to show the random recipe.
You can SELECT
a random record in SQL Server this way: SQL Server Random Sort . If you're using some other database, there are other ways: http://www.petefreitag.com/item/466.cfm . Just make sure your WHERE
clause has Tag=17
in it.
If you want to keep the data in memory, you won't do much better than a list of (4 byte) pointers for each tag. If you can use a DB... well, others have already posted about that. Depending on how huge is "huge", you might just fork out some $$$ to add RAM to the target machine.
If you do want to keep the data in memory, but want to be ridiculously parsimonious with memory, you could try to squeeze down the 4 bytes per tag-recipe combination. For example, keep all the recipes in a big array, and (assuming you won't have more than about a million) store array indexes in 3 bytes each.
To go even further, you could divide the recipes with a given tag into equally-sized "buckets" (each occupying a contiguous area of the big array), store a starting index for each "bucket" (in 3-4 bytes), and then store a list of delta values between indexes of consecutive recipes with the given tag. Encode the delta values using an array of bytes, in such a way that a single delta value can use anything from 1-4 bytes, as required.
Because the number of recipes in a "bucket" will be limited to a constant number, retrieval using this approach is still O(1).
(I have done embedded programming on micros with as little as 256 bytes of RAM... when you do that you start thinking of very creative ways to save bytes or even bits. I'm sure that going to such lengths will not be necessary in your application, but I thought this was an interesting idea.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With