Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pie chart of co-presence in clusters for about 10 factors in r

I've got a two-column dataset with about 30000 clusters and 10 factors like this:

cluster-1 Factor1
cluster-1 Factor2
...
cluster-2 Factor2
cluster-2 Factor3
...

And I would like to represent the co-occurrence of factors in the clusterset. Something like "Factor1+Factor3+Factor5 in 1234 clusters", and so on for the different combinations. I thought I could so something like a pie chart, but with 10 factors, I take there can be too many combinations.

What would be a good way of representing this?

like image 876
719016 Avatar asked Nov 04 '22 12:11

719016


1 Answers

There is one good programming question in here that should be addressed:

How do I count the number of co-occurrences of factors in the different clusters?

First simulate some data:

n = 1000

set.seed(12345)
n.clusters = 100
clusters = rep(1:n.clusters, length.out=n)

n.factors = 10
factors = round(rnorm(n, n.factors/2, n.factors/5))
factors[factors > n.factors] = n.factors
factors[factors < 1] = 1

data = data.frame(cluster=clusters, factor=factors)
> data
  cluster factor
1       1      6
2       2      6
3       3      5
4       4      4
5       5      6
6       6      1
...

Then here is the code that could be used to tabulate the number of times each combination of factors occurs in the clusters:

counts = with(data, table(tapply(factor, cluster, function(x) paste(as.character(sort(unique(x))), collapse=''))))

This can be represented as a simple pie chart, for example,

dev.new(width=5, height=5)
pie(counts[counts>1])

enter image description here

but simple counts like this are often most efficiently displayed as a sorted table. For more on this, check out Edward Tufte.

like image 82
John Colby Avatar answered Nov 09 '22 04:11

John Colby