Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Visualizing set hierarchies as color coded graphs

Tags:

I have been reading quite a bit on graphing libraries for Java and Javascript lately but I haven't found a good way to do what I want to do.

Essentially I have a hierarchy of sets with regards to a bunch of elements (up to several thousands). These sets can be fully or partly overlapping, fully covering or completely disjoint from one another. What I would like to do is to display the following information:

  • The size of a set (in relation to the other sets)
  • A "heat" value (in color code) of a set calculated from the elements it covers
  • The full topology of the sets in a single graph (so that overlaps, intersections etc are displayed to the user)

Edit: Perhaps I should give an example of what I mean by sets and elements and partially overlapping hierarchies. The following is an over-simplified version of the kind of sets I deal with (note that numbers 1-10 and letters a-h and X represent elements which are comparable to one another):

Set1 = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}
Set2 = {1, 2, 3, 4, 5, 6}
Set3 = {1, 2, 3}
Set4 = {1, 4, 5, 6, 7}
Set5 = {a, b, c, d, e, f, g, h}
Set6 = {a, b, c, d, e}
Set7 = {a, b, c, 7}
Set8 = {2, 4, 7, 8, c, f}
Set9 = {X}

I am not sure how I would go about displaying this information in an intuitive way. I have seen Voronoi ¹,² graphs which I really like visually, however they have a different mathematical background so I don't think I'll be able to portray the hierarchies I have in a proper manner. I would like to create these graphs during runtime (in case of Java) or using Javascript in case of HTML deployment, either is perfectly fine. One thing that is a constraint, however, is that the graphs need to be either created, or can be exportable, to high-res vector graphics.

My questions in short:

  1. Is there a nice way to visualize the kind of data I have? If so does it exist in a readily implemented form (i.e. a library)?
  2. If there is no easy solution to the problem, in other words if I need to invent my wheel in this case, how do I go about implementing such a graph myself? What is a good starting point? What do I pay extra attention to?

Thanks!

Edit: I potential idea I had was to layout all the elements in the universal set as a hexagonal grid with the desired color overlay, and then draw the boundaries for the sets. There are however several problems with that idea, in particular the problem of designating locations for the elements, so that the sets are not split all over the graph. Any comments/suggestions?

like image 482
posdef Avatar asked Jul 02 '12 15:07

posdef


2 Answers

Yes, this is a fairly well-studied problem. What you are describing is called a hypergraph. Each element can be represented as a vertex in a graph, and the sets are the hyperedges. The problem then becomes that of visualizing hypergraphs.

enter image description here

Unfortunately there isn't a perfect, generalized solution to this since even the simplest graphs can have complex visualizations.

If your sets are relatively small (< 5 elements), you can use a regular graph drawing library like graphviz. To do this, simply connect all pairs of vertices within each set and color them differently. This will yield a solution similar to this:

enter image description here

like image 143
tskuzzy Avatar answered Oct 12 '22 10:10

tskuzzy


Have you considered a 2-dimensional grid:

  • Put the set number on one axis
  • Put the unique elements found in all sets on the other axis
  • Color each cell where an element is found in a set (by looking at that row and column's labels)

While this visualization method would normally be inferior to some of the more complicated ones mentioned so far, it has the virtue of actually being possible when you have thousands of elements and thousands of sets.

The trick will be to order the rows and columns in a way that puts the most information together in a way useful to the user. My instinct says that the problem you're trying to solve is to make the colored cells be as "bloblike" as possible—if each set of adjacent colored cells is called an "area", to have the least number of distinct areas and for them to have the fewest holes in them.

That is a very complicated problem in its own right, but could be at least partially solved by working up some adjacency factors for each set against every other set. What you're looking for are "islands" of closeness--so start with the pair of most alike sets, add them to the graph, and consider them a region. Recalculate your closeness numbers with the region replacing the pair it holds (averaging in some way?). Find the next most close pair of items (each item being a region or a set), and if that pair is within a certain threshold of closeness to any existing region in the graph, attach to one side of that region, otherwise create a new, separate region (again removing the pair's closeness values and recomputing for the region itself). Eventually, all sets will be added to regions, and all regions will be joined. Joining two regions can have four possibilities (flipping may be required), so which sides to attach in the graph could be calculated by the closeness of the sets on the 4 edges of the two regions.

While this may never give the optimal configuration, it should come up with something that has few regions compared to a random distribution.

Finally, some dynamic reordering might be useful, by allowing the user to select an interesting set or element, and use that as the seed for a completely rearranged graph, calculating each addition based on closeness to that element (and subsequently that region after being combined with another element), rather than overall lowest closeness of any.

Here is a diagram of the result, having done the above logic process on the example set of data in your question:

Sets and Elements

Deciding how to order the columns is complex, but basically you can get sort of reasonable results by moving columns to be adjacent when such a move won't disturb the colored block area of any already-added segments.

Additional thoughts:

  • Calculating set closeness is not just how many elements they have in common, but also how many elements they have that are not in common. If two pairs of sets have 3 elements in common between the pairs, but one has 5 non-shared elements and the other has 3 non-shared elements, then the pair with 3 non-shared elements is a closer match than the other.
  • After adding a set to the graph, there is an opportunity to reorder the elements. Stacking the elements as leftmost as possible is a good start for the first placement. After that, stacking most common elements leftmost seems good. After that, it breaks down. I wonder if getting the colored cells as close to the diagonal (from top left to bottom right) would also be a useful algorithm--this reminds me a little of the Design Structure Matrix though that only shows one-way dependencies rather than two-way relationships.
  • When a colored blob consists of sets that are completely disjoint from all other sets (like the set containing X in your example), it can be moved to a separate graph.
like image 36
ErikE Avatar answered Oct 12 '22 08:10

ErikE