Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select top N elements of related objects

I have a class Product to hold a specific instance of a given product. This class has a list of related products that are similar to main product.

class Product
{
    public string Name;
    public double Rating;
    public List<Product> RelatedProducts;
    //...
    public List<Product> GetTopRelatedProducts(int N)
    {
        // How to implement this method
        // What I did so far(Bad code)
        //     1- INFINITE RECURSION
        //     2- Can not remember visited objects
        var myList = new List<Product>();
        foreach(Product prod in RelatedProducts)
        {
             myList.AddRange(prod.GetTopRelatedProducts(N));
        }
        return myList.Distinct().OrderByDescending(x => x.Rating).Take(N).ToList();
    }
}

I want to define a method within Product class to get top N (best rated) related products. This method should take into consideration that elements in RelatedProducts list are of type Product and they also have their own RelatedProducts list. So I should keep navigating nested object until all related products were reached and after that, take the top N product. I mean, the solution would not be simply this.RelatedProducts.OrderByDescending(x => x.Rating).Take(N);

One more thing to keep in mind: Two products can be mutually related. Which means a product A can belong to RelatedProducts list of a product B and B also can belong to RelatedProducts list of product A.

Any suggestion how to solve this issue in optimal way?

Imagine, I have millions of products to maintain. How can I navigate all related products recursively and recognize already visited ones?

I tagged this as both C# and Java, as the same logic could be applied to both languages

like image 845
Mhd Avatar asked Mar 27 '17 14:03

Mhd


3 Answers

Imagine, I have millions of products to maintain. How can I navigate all related products recursively and recognize already visited ones?

It doesn't need to be recursive. Explicit Stack or Queue can serve the navigating part. For collecting the result a HashSet can be used instead of List. It would serve two purposes - to allow you to skip the already visited elements and also eliminate the need of Distinct at the end.

Here is a sample Queue based implementation:

public List<Product> GetTopRelatedProducts(int N)
{
    var relatedSet = new HashSet<Product>();
    var relatedListQueue = new Queue<List<Product>>();
    if (RelatedProducts != null && RelatedProducts.Count > 0)
        relatedListQueue.Enqueue(RelatedProducts);
    while (relatedListQueue.Count > 0)
    {
        var relatedList = relatedListQueue.Dequeue();
        foreach (var product in relatedList)
        {
            if (product != this && relatedSet.Add(product) && product.RelatedProducts != null && product.RelatedProducts.Count > 0)
                relatedListQueue.Enqueue(product.RelatedProducts);
        }
    }
    return relatedSet.OrderByDescending(x => x.Rating).Take(N).ToList();
}

Update: For completeness, here are the other possible implementations of the related set collecting part:

With explicit Stack:

public List<Product> GetTopRelatedProducts(int N)
{
    if (RelatedProducts == null || RelatedProducts.Count == 0)
        return new List<Product>();
    var relatedSet = new HashSet<Product>();
    var pendingStack = new Stack<List<Product>.Enumerator>();
    var relatedList = RelatedProducts.GetEnumerator(); 
    while (true)
    {
        while (relatedList.MoveNext())
        {
            var product = relatedList.Current;
            if (product != this && relatedSet.Add(product) && product.RelatedProducts != null && product.RelatedProducts.Count > 0)
            {
                pendingStack.Push(relatedList);
                relatedList = product.RelatedProducts.GetEnumerator();
            }
        }
        if (pendingStack.Count == 0) break;
        relatedList = pendingStack.Pop();
    } 
    return relatedSet.OrderByDescending(x => x.Rating).Take(N).ToList();
}

Although a bit more verbose than the explicit Queue based implementation, this method has less space requirements - O(height) where height is the maximum depth.

The benefit of both iterative implementations is that of course they can handle much bigger depth than the recursive solutions which can lead to StackOverflowExpection. But if the depth is not expected to be so big and you prefer recursion, then here are a couple recursive implementations (all they need to have access to the relatedSet and this):

With classic private recursive method:

public List<Product> GetTopRelatedProducts(int N)
{
    var relatedSet = new HashSet<Product>();
    GetRelatedProducts(this, relatedSet);
    return relatedSet.OrderByDescending(x => x.Rating).Take(N).ToList();
}

private void GetRelatedProducts(Product product, HashSet<Product> relatedSet)
{
    if (product.RelatedProducts == null) return;
    foreach (var item in product.RelatedProducts)
        if (item != this && relatedSet.Add(item))
            GetRelatedProducts(item, relatedSet);
}

With recursive lambda:

public List<Product> GetTopRelatedProductsD(int N)
{
    var relatedSet = new HashSet<Product>();
    Action<Product> GetRelatedProducts = null;
    GetRelatedProducts = product =>
    {
        if (product.RelatedProducts == null) return;
        foreach (var item in product.RelatedProducts)
            if (item != this && relatedSet.Add(item))
                GetRelatedProducts(item);
    };
    GetRelatedProducts(this);
    return relatedSet.OrderByDescending(x => x.Rating).Take(N).ToList();
}

Last, but not least, with the latest C# 7.0 addition - recursive local function:

public List<Product> GetTopRelatedProducts(int N)
{
    var relatedSet = new HashSet<Product>();
    GetRelatedProducts(this);
    return relatedSet.OrderByDescending(x => x.Rating).Take(N).ToList();

    void GetRelatedProducts(Product product)
    {
        if (product.RelatedProducts == null) return;
        foreach (var item in product.RelatedProducts)
            if (item != this && relatedSet.Add(item))
                GetRelatedProducts(item);
    }
}

All these methods handle (IMO) optimally the collecting part. The top N part of course is not optimal - O(N*log(N)) and can be optimized as mentioned in @Amit Kumar's answer, but it would require implementing a missing standard data structure, which is outside the scope of a SO answer.

like image 130
Ivan Stoev Avatar answered Nov 19 '22 05:11

Ivan Stoev


I would recommend using a priority queue(min heap) of fixed size N. Build the priority queue the time you build the list, so after initial build operation, the priority queue will have top N rated products. Subsequent addition/deletion can be done by checking the checking the top element in the priority queue in O(log(N)).

Pseudo Code: New element to be Added E

while PQ.size < N
     PQ.enqueue(E)
if PQ.size == N
   Etop = PQ.top() < Min heap element >
   if E.rating > Etop.rating 
      PQ.dequeu()
      PQ.enqueue(E)

To get the top N elements , just iterate through the PQ.

like image 36
Amit Kumar Avatar answered Nov 19 '22 06:11

Amit Kumar


My solution :

public List<Product> GetTopRelatedProducts(int N)
{
     List<Product> visitedProducts = new List<Product>();
     Queue<Product> ProductsQueue = new Queue<Product>();
     visitedProducts.add(this);
     foreach (product prod in relatedProducts)
         if(prod != this) //if a product can't be related to itself then remove this if statement
             ProductsQueue.Enqueue(prod); 

     //for now visitedproducts contains only our main product and ProductsQueue contains the product related to it.


     while (ProductsQueue.count > 0)
     {
          Product p = ProductsQueue.Dequeue();
          visitedProducts.add(p);
          foreach (product prod in p.relatedProducts)
          {
              if( ! visitedProduct.contains(prod) && !ProductsQueue.contains(prod))//if we haven't visited the product already or if it is not in the queue so we are going to visit it.
                  ProductsQueue.Enqueue(prod);
          }

     }
     //now visitedProducts contains all the products that are related (somehow :P) to your first product

     visitedProducts.remove(this);// to remove the main product from the results
     //all what is left to do is to take the top N products.
     return visitedProducts.OrderByDescending(x => x.Rating).Take(N).ToList();
}

I tried to make it as simple as possible ;)

like image 2
Abdou Avatar answered Nov 19 '22 06:11

Abdou