I have a list of recipes obtained from a database that looks like this: <pre class="prettyprint"><code>List<RecipeNode> _recipeList; </code></pre> <code>RecipeNode</code>, among other things, has a property that references one or more tags (Such as Dinner, Breakfast, Side, Vegetarian, Holiday, and about 60 others). <pre class="prettyprint"><code> public sealed class RecipeNode { public Guid RecipeId; public Byte[] Tags; //Tags such as 1, 5, 6, 8, 43 //... More stuff } </code></pre> Finding a random recipe from <code>_recipeList</code> in O(1) would of course be easy, however what I need to do is find a random recipe that has, say, 5 in the <code>Tags</code> in O(1). Right now, my only idea is to make an array of <code>List<RecipeNodes></code>, keyed by tag. For example: <pre class="prettyprint"><code>List<RecipeNode>[] _recipeListByTag; </code></pre> Then, <code>_recipeListByTag[5]</code> would contain a list of all the recipes that have a 5 in the <code>Tags</code> array. I could then choose a random allowed tag and a random recipe within that tag in O(1). The drawback of this approach is the size of this multidimensional array would be <code>Recipes * Tags</code> (eg, the sum of Tags.length across all recipes), which starts to take up a lot of memory since I'm storing a potentially huge number of recipes in this array. Of course, since <code>RecipeNode</code> is a reference type, I'm only repeating the 4byte pointers to the recipes, so this still might be the best way to go. Is there a more efficient data structure or algorithm I could use to allow me to find a random recipe that contains a certain allowed tag? Thanks!

<code>List<RecipeNode>[] _recipeListByTag</code> is probably the best approach for you, and its size is not <code>Recipes * Tags</code> because each list in the array will only contain as many recipes as match a tag, and not more. Its size would become <code>Recipes * Tags</code> if every single recipe contained every single tag. If the amount of memory occupied by your data structures is so very dear to you, do not forget to call List.TrimExcess() after you have populated each list.

Is this homework? I doubt any real-world recipe program would require O(1) access to tags, and be too slow for using a database. I also doubt any real-world recipe would have numeric tags. Understanding the real domain can help provide a better answer. However, if it really is about recipes and numeric tags, and if you only have 256 tags, why don't you just choose a random recipe 1 million times? The odds of not finding a recipe with the required tag are minimal, and the complexity is still O(1). If you don't like the odds, choose a random recipe 10^20 times. The complexity is still O(1). UPDATE: Since it's not the O(1) you're worried about, but rather the time it takes to pick a random recipe, I suggest you let your database handle this for you - the same database that holds all the recipes anyway, and the same database you're going to access to show the random recipe. You can <code>SELECT</code> a random record in SQL Server this way: SQL Server Random Sort . If you're using some other database, there are other ways: http://www.petefreitag.com/item/466.cfm . Just make sure your <code>WHERE</code> clause has <code>Tag=17</code> in it.

How to pick a random element from an array that matches a certain criteria in O(1)

Tags:

c#

algorithm

data-structures

I have a list of recipes obtained from a database that looks like this:

List<RecipeNode> _recipeList;

RecipeNode, among other things, has a property that references one or more tags (Such as Dinner, Breakfast, Side, Vegetarian, Holiday, and about 60 others).

   public sealed class RecipeNode
   {
      public Guid RecipeId;
      public Byte[] Tags; //Tags such as 1, 5, 6, 8, 43
      //... More stuff
   }

Finding a random recipe from _recipeList in O(1) would of course be easy, however what I need to do is find a random recipe that has, say, 5 in the Tags in O(1).

Right now, my only idea is to make an array of List<RecipeNodes>, keyed by tag. For example:

List<RecipeNode>[] _recipeListByTag;

Then, _recipeListByTag[5] would contain a list of all the recipes that have a 5 in the Tags array. I could then choose a random allowed tag and a random recipe within that tag in O(1).

The drawback of this approach is the size of this multidimensional array would be Recipes * Tags (eg, the sum of Tags.length across all recipes), which starts to take up a lot of memory since I'm storing a potentially huge number of recipes in this array. Of course, since RecipeNode is a reference type, I'm only repeating the 4byte pointers to the recipes, so this still might be the best way to go.

Is there a more efficient data structure or algorithm I could use to allow me to find a random recipe that contains a certain allowed tag? Thanks!

383

asked Jan 28 '12 21:01

Mike Christensen

3 Answers

List<RecipeNode>[] _recipeListByTag is probably the best approach for you, and its size is not Recipes * Tags because each list in the array will only contain as many recipes as match a tag, and not more. Its size would become Recipes * Tags if every single recipe contained every single tag.

If the amount of memory occupied by your data structures is so very dear to you, do not forget to call List.TrimExcess() after you have populated each list.

183

answered Oct 16 '22 15:10

Mike Nakis

Is this homework? I doubt any real-world recipe program would require O(1) access to tags, and be too slow for using a database. I also doubt any real-world recipe would have numeric tags. Understanding the real domain can help provide a better answer.

However, if it really is about recipes and numeric tags, and if you only have 256 tags, why don't you just choose a random recipe 1 million times? The odds of not finding a recipe with the required tag are minimal, and the complexity is still O(1). If you don't like the odds, choose a random recipe 10^20 times. The complexity is still O(1).

UPDATE:

Since it's not the O(1) you're worried about, but rather the time it takes to pick a random recipe, I suggest you let your database handle this for you - the same database that holds all the recipes anyway, and the same database you're going to access to show the random recipe.

You can SELECT a random record in SQL Server this way: SQL Server Random Sort . If you're using some other database, there are other ways: http://www.petefreitag.com/item/466.cfm . Just make sure your WHERE clause has Tag=17 in it.

answered Oct 16 '22 15:10

zmbq

If you want to keep the data in memory, you won't do much better than a list of (4 byte) pointers for each tag. If you can use a DB... well, others have already posted about that. Depending on how huge is "huge", you might just fork out some $$$ to add RAM to the target machine.

If you do want to keep the data in memory, but want to be ridiculously parsimonious with memory, you could try to squeeze down the 4 bytes per tag-recipe combination. For example, keep all the recipes in a big array, and (assuming you won't have more than about a million) store array indexes in 3 bytes each.

To go even further, you could divide the recipes with a given tag into equally-sized "buckets" (each occupying a contiguous area of the big array), store a starting index for each "bucket" (in 3-4 bytes), and then store a list of delta values between indexes of consecutive recipes with the given tag. Encode the delta values using an array of bytes, in such a way that a single delta value can use anything from 1-4 bytes, as required.

Because the number of recipes in a "bucket" will be limited to a constant number, retrieval using this approach is still O(1).

(I have done embedded programming on micros with as little as 256 bytes of RAM... when you do that you start thinking of very creative ways to save bytes or even bits. I'm sure that going to such lengths will not be necessary in your application, but I thought this was an interesting idea.)

answered Oct 16 '22 13:10

Alex D

Related questions
                            
                                How to use partial config files
                            
                                LINQ - get total count of values from nested list
                            
                                SystemEvent.TimeChange showing same time even timezone changes
                            
                                Winforms: Problems validating a cell in a datagridview
                            
                                Database already exist. Choose a Different Name using CreateDatabase()
                            
                                Every array of enum implements IEnumerable<EveryOtherEnum>. How do I work around this?
                            
                                How can I restart IIS from C# code running as a user who is an administrator?
                            
                                IoC with value type and object type dependencies
                            
                                Exception when adding Dictionary entry
                            
                                Task cancellation best practices
                            
                                How to check programmatically if windows is up-to-date?
                            
                                Connecting to PowerPivot with C#
                            
                                View Exception.Data in Watch Window while debugging
                            
                                How can I use "Linq to Objects" to put a set of contiguous dates in one group?
                            
                                Stop Autofac module registering already-registered components
                            
                                unit / integration testing and restoring databases
                            
                                LINQ - Different syntax style, different result?
                            
                                Can I modify a Win7 environment to allow a .Net program to always run as Administrator?
                            
                                FindWindow with partially known title
                            
                                Querying data using Entity Framework from dynamically created table

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With