Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating a Hierarchy of Nested Entities in Entity Framework

I am trying to create a hierarchical representation in Entity Framework, and I can't seem to find much on the subject after searching around.

Premise: I am working on a backlink monitoring tool where I can paste in a bunch of URLs to see if they point to a specific domain. If so, I want to remove them from the list and store them as top-level (Tier 1) backlinks. After locating and removing all of the backlinks that link directly to the URL, I want to run through the remaining backlinks in the list to see if they point to any of the URLs in the newly-created top-level backlink list, and for the ones that point to the top-level backlinks, store them as Tier 2 backlinks. Then search for Tier 3 backlinks, and so on until the entire list has been checked.

I have a Website entity that contains the Url that is to be used for the first run through the list of imported backlinks. Those that are found are moved to a list, and their URLs are used when looping through the 2nd time, and so on.

I originally created a separate property in the Website entity for each "Tier" of links, but that doesn't seem to be very efficient because when trying to render the hierarchy, the code has to loop through each Tier and re-match the urls from the tiers below to recreate the actual linking structure.

End goal sample:

link hierarchy

So I instead believe I should create a single "Backlink" model, and have each backlink entity store a list of the backlinks below it, then when trying to view the backlink hierarchy, just do a simple loop through, and loop through each sub-backlink entity.

A sample of the backlink entity is as follows:

public class Backlink
{
    public int BacklinkID { get; set; }
    public string Url { get; set; }
    public string AnchorText { get; set; }
    public string LinksTo { get; set; }

    public int PageAuthority { get; set; }
    public int PageRank { get; set; }

    public virtual ICollection<Backlink> Backlinks { get; set; }
}

I have written the code that actually goes through and checks each backlink's HTML to find if the backlink points to each specific URL, so now I'm trying to figure out the best way to store the results.

Is creating an entity that stores a list of its same type of entity a smart approach, or am I going about this all wrong? Will doing something in this way hurt the performance when querying the database?

Ideally I would like to use lazy-loading and show only the top-tier backlinks at first, then when clicking on the specific backlink, have EF make another call to go a fetch the sub-backlinks and so on - so would this storage approach with lazy loading be smart, or should I scrap that idea and figure out a totally different schema for this?

I'm not great with EF yet so any insights on best approach would be greatly appreciated.

like image 931
drewness Avatar asked Oct 20 '22 16:10

drewness


1 Answers

What you are trying to implement is called Adjacency List. It seems that just adding ICollection<Backlink>; Backlinks collection is ok (of course, a proper model configuration is required). However, Adjacency list itself is not a good friend of performance and particularly a typical implementation of it in EF (exactly like you suggested). There are two options:

  1. Like you suggested, load links level-by-level on demand. In this case, selected model itself actually works fine (each level is very simple SELECT like @Danexxtone mentioned). However, you will have a lot of requests to app server / DB. Hence, probably not so good user experience.
  2. You may want to load whole tree in order to show nodes to user without any delay. Doing this using EF means recursion over navigation collections and it's really the worst idea - too much requests to DB.
    It seems that EF doesn't have more options. But, you can use plain SQL (through EF data context, by the way)... And there are much more interesting approaches:
    1. CTE (like @Jon mentioned). It works over adjacency list without any additional changes to DB structure. Not bad option, but not the best.
    2. Tree path column. Let's number root of hierarchy as "1", level 1 links as "2", "3", "4" and level 3 link as "5". Each node in tree, each link, may have unique string path like "1/2/5/". Just add one more column "Path" to DB - and you will be able to extract sub-tree using simple LIKE expression (or even .StartsWith in EF)
    3. I assume that you're using MS SqlServer DB. Then you have even better option - hierarchyid data type. It's not supported by EF, however it provides all "tree path" functionality out of box.
      I wrote that CTE is not the best option. That's because of performance - queries using string tree path is much more efficient (don't forget about indexes). Performance of hierarchyid is a little better than tree path, but it's advantage - the built-in API for tree manipulations.
      One more interesting approach is Nested Sets. However, I wouldn't recommend it - too huge overhead on inserting new nodes and it's not so easy to code it.

Conclusion

If you are familiar with SQL itself and using plain SQL in EF - the best option could be hierarchyid.
If you want to code using only EF - adjacency list is the only option. Just do not retrieve deep sub-trees using recursive traversal of navigation collections - it may really hurt.

like image 150
Corwin Avatar answered Oct 24 '22 01:10

Corwin