I have been reading quite a bit on graphing libraries for Java and Javascript lately but I haven't found a good way to do what I want to do.
Essentially I have a hierarchy of sets with regards to a bunch of elements (up to several thousands). These sets can be fully or partly overlapping, fully covering or completely disjoint from one another. What I would like to do is to display the following information:
Edit: Perhaps I should give an example of what I mean by sets and elements and partially overlapping hierarchies. The following is an over-simplified version of the kind of sets I deal with (note that numbers 1
-10
and letters a
-h
and X
represent elements which are comparable to one another):
Set1 = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}
Set2 = {1, 2, 3, 4, 5, 6}
Set3 = {1, 2, 3}
Set4 = {1, 4, 5, 6, 7}
Set5 = {a, b, c, d, e, f, g, h}
Set6 = {a, b, c, d, e}
Set7 = {a, b, c, 7}
Set8 = {2, 4, 7, 8, c, f}
Set9 = {X}
I am not sure how I would go about displaying this information in an intuitive way. I have seen Voronoi ¹,² graphs which I really like visually, however they have a different mathematical background so I don't think I'll be able to portray the hierarchies I have in a proper manner. I would like to create these graphs during runtime (in case of Java) or using Javascript in case of HTML deployment, either is perfectly fine. One thing that is a constraint, however, is that the graphs need to be either created, or can be exportable, to high-res vector graphics.
My questions in short:
Thanks!
Edit: I potential idea I had was to layout all the elements in the universal set as a hexagonal grid with the desired color overlay, and then draw the boundaries for the sets. There are however several problems with that idea, in particular the problem of designating locations for the elements, so that the sets are not split all over the graph. Any comments/suggestions?
Yes, this is a fairly well-studied problem. What you are describing is called a hypergraph. Each element can be represented as a vertex in a graph, and the sets are the hyperedges. The problem then becomes that of visualizing hypergraphs.
Unfortunately there isn't a perfect, generalized solution to this since even the simplest graphs can have complex visualizations.
If your sets are relatively small (< 5 elements), you can use a regular graph drawing library like graphviz. To do this, simply connect all pairs of vertices within each set and color them differently. This will yield a solution similar to this:
Have you considered a 2-dimensional grid:
While this visualization method would normally be inferior to some of the more complicated ones mentioned so far, it has the virtue of actually being possible when you have thousands of elements and thousands of sets.
The trick will be to order the rows and columns in a way that puts the most information together in a way useful to the user. My instinct says that the problem you're trying to solve is to make the colored cells be as "bloblike" as possible—if each set of adjacent colored cells is called an "area", to have the least number of distinct areas and for them to have the fewest holes in them.
That is a very complicated problem in its own right, but could be at least partially solved by working up some adjacency factors for each set against every other set. What you're looking for are "islands" of closeness--so start with the pair of most alike sets, add them to the graph, and consider them a region. Recalculate your closeness numbers with the region replacing the pair it holds (averaging in some way?). Find the next most close pair of items (each item being a region or a set), and if that pair is within a certain threshold of closeness to any existing region in the graph, attach to one side of that region, otherwise create a new, separate region (again removing the pair's closeness values and recomputing for the region itself). Eventually, all sets will be added to regions, and all regions will be joined. Joining two regions can have four possibilities (flipping may be required), so which sides to attach in the graph could be calculated by the closeness of the sets on the 4 edges of the two regions.
While this may never give the optimal configuration, it should come up with something that has few regions compared to a random distribution.
Finally, some dynamic reordering might be useful, by allowing the user to select an interesting set or element, and use that as the seed for a completely rearranged graph, calculating each addition based on closeness to that element (and subsequently that region after being combined with another element), rather than overall lowest closeness of any.
Here is a diagram of the result, having done the above logic process on the example set of data in your question:
Deciding how to order the columns is complex, but basically you can get sort of reasonable results by moving columns to be adjacent when such a move won't disturb the colored block area of any already-added segments.
Additional thoughts:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With