I have a dataframe with 3 columns
$x -- at http://pastebin.com/SGrRUJcA $y -- at http://pastebin.com/fhn7A1rj $z -- at http://pastebin.com/VmVvdHEEthat I wish to use to generate a stacked barplot. All of these columns hold integer data. The stacked barplot should have the levels along the x-axis and the data for each level along the y-axis. The stacks should then correspond to each of $x, $y and $z.
UPDATE: I now have the following:
counted <- data.frame(table(myDf$x),variable='x')
counted <- rbind(counted,data.frame(table(myDf$y),variable='y'))
counted <- rbind(counted,data.frame(table(myDf$z),variable='z'))
counted <- counted[counted$Var1!=0,] # to get rid of 0th level??
stackedBp <- ggplot(counted,aes(x=Var1,y=Freq,fill=variable))
stackedBp <- stackedBp+geom_bar(stat='identity')+scale_x_discrete('Levels')+scale_y_continuous('Frequency')
stackedBp
which generates:
.
Two issues remain:
the x-axis labeling is not correct. For some reason, it goes: 46, 47, 53, 54, 38, 40.... How can I order it naturally?
I also wish to remove the 0th label.
I've tried using +scale_x_discrete(breaks = 0:50, labels = 1:50) but this doesn't work.
NB. axis labeling issue: Dataframe column appears incorrectly sorted
Not completely sure what you're wanting to see... but reading ?barplot says the first argument, height must be a vector or matrix. So to fix your initial error:
myDf <- data.frame(x=sample(1:10,100,replace=T),y=sample(11:20,100,replace=T),z=1:10)
barplot(as.matrix(myDf))
If you provide a reproducible example and a more specific description of your desired output you can get a better answer.
Or if I were to guess wildly (and use ggplot)...
myDf <- data.frame(x=sample(1:10,100,replace=T),y=sample(11:20,100,replace=T),z=1:10)
myDf.counted<- data.frame(table(myDf$x),variable='x')
myDf.counted <- rbind(myDf.counted,data.frame(table(myDf$y),variable='y'))
myDf.counted <- rbind(myDf.counted,data.frame(table(myDf$z),variable='z'))
ggplot(myDf.counted,aes(x=Var1,y=Freq,fill=variable))+geom_bar(stat='identity')
I'm surprised that didn't blow up in your face. Cross-classifying the joint occurrence of three different vectors each of length 35204 would often consume many gigabytes of RAM (and would possibly create lots of useless 0's as you found). Maybe you wanted to examine instead the results of sapply(myDf, table)? This then creates three separate tables of counts.
It's a rather irregular result and would need further work to get it into a matrix form but you might want to consider using densityplot to display the comparative distributions which I think is your goal.
$x
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
126 711 1059 2079 3070 2716 2745 3329 2916 2671 2349 2457 2055 1303 892 692
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
559 799 482 299 289 236 156 145 100 95 121 133 60 34 37 13
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
15 12 56 10 4 7 2 14 13 28 30 20 16 62 74 58
49 50
40 15
$y
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
3069 32 1422 1376 1780 1556 1937 1844 1967 1699 1910 1924 1047 894 975 865
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
635 1002 710 908 979 848 678 908 696 491 417 412 499 411 421 217
32 33 34 35 36 37 39 42 46 47 53 54
265 182 121 47 38 11 2 2 1 1 1 4
$z
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
31 202 368 655 825 1246 900 1136 1098 1570 1613 1144 1107 1037 1239 1372
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
1306 1085 843 867 813 1057 1213 1020 1210 939 725 644 617 602 739 584
32 33 34 35 36 37 38 39 40 41 42 43
650 733 756 681 684 657 544 416 220 48 7 1
The density plot is really simple to create in lattice:
densityplot( ~x+y+z, myDf)

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With