I need to make a bar graph where the data is divided into bins.
My data looks like this:
1.0 5
1.2 4
2.4 1
4.3 6
5.2 10
Then on the X axis I want to have the value of time, like: [1-4), [4-5), etc. (depending on the data in the csv file).
On the Y axis I want to have a number of occurrences, like 10, 16, etc.
I have written this R code:
dataset <- read.csv("/Users/MNeptune/Documents/workspace R/BarPlot/times.csv")
dataset <- data.matrix(dataset, rownames.force = NA)
time <- dataset[,1]
occurence <- dataset[,2]
min <- min(time);
max <- max(time);
# Creo i bin
Groups <- cut(x = time, breaks = seq(from = min, to = max, by = 2))
Groups <- data.matrix(Groups, rownames.force = NA)
# Raggruppo i dati nei bin
Bygroup = tapply(occurence, Groups, sum)
# Faccio il plot dei bin
barplot(height = Bygroup, xlab = "time", ylab = "occurence")
But the code does not bin the data correctly (wrong number and not in just order). Where is the problem?
Edit1:
Thanks of eipi10 I have obtain this (bin lenght of 0.01):

Now the problem is how to read the X axis label, because I need to read the valure of local minima.
How i can put a "scale bar" like Y axis ?
Ok i can't put the exact value of all bin, but at least one everey 0.5?
You can see what's going wrong if you do the following:
seq(from=1.0, to=5.2, by=2)
[1] 1 3 5
cut(c(1.0,1.2,2.4,4.3,5.2), breaks=seq(from=1.0, to=5.2, by=2))
[1] <NA> (1,3] (1,3] (3,5] <NA>
Levels: (1,3] (3,5]
In other words, seq stops at the highest value less than 5.2, which is 5, so you miss the row with time=5.2. In addition, cut, by default, excludes the low end of the value range, so you miss that as well when you use the lowest value of time as the low end of the cut range.
Here's a reworking of your code to get the plot you're looking for:
dat=read.table(text="time occurence
1.0 5
1.2 4
2.4 1
4.3 6
5.2 10", header=TRUE)
# Creo i bin
dat$Groups <- cut(x=dat$time, breaks=seq(from=0, to=ceiling(max(dat$time)), by = 2))
# Raggruppo i dati nei bin
Bygroup = tapply(dat$occurence, dat$Groups, sum)
# Faccio il plot dei bin
barplot(height = Bygroup, xlab = "time", ylab = "occurence")

If you want different breaks, you can of course adjust the breaks argument of cut. In particular, note the right argument to cut, which allows you to choose whether you want the break intervals to be closed on the left or the right. right=TRUE is the default, which is why the first row of your data was excluded from Groups in your original code.
UPDATE: To answer your follow-up question, you can find the bin of the minimum value of Bygroup as follows:
names(Bygroup)[which.min(Bygroup)]
[1] "(2,4]"
If you want to rank the Bygroup values to find the lowest, next lowest, etc. you can use rank, which returns the rank of each value:
rank(Bygroup)
(0,2] (2,4] (4,6]
2 1 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With