Cut() error - 'breaks' are not unique

Question

I have following dataframe:

 a         
    ID   a.1    b.1     a.2   b.2
1    1  40.00   100.00  NA    88.89
2    2  100.00  100.00  100   100.00
3    3  50.00   100.00  75    100.00
4    4  66.67   59.38   NA    59.38
5    5  37.50   100.00  NA    100.00
6    6  100.00  100.00  100   100.00

When I apply the following code to this dataframe:

 temp <- do.call(rbind,strsplit(names(df)[-1],".",fixed=TRUE))
 dup.temp <- temp[duplicated(temp[,1]),]

 res <- lapply(dup.temp[,1],function(i) {
 breaks <- c(-Inf,quantile(a[,paste(i,1,sep=".")], na.rm=T),Inf)
 cut(a[,paste(i,2,sep=".")],breaks)
 })

the cut () function gives an error:

 Error in cut.default(a[, paste(i, 2, sep = ".")], breaks) : 
 'breaks' are not unique

However, the same code works perfectly well on similar dataframe:

 varnames<-c("ID", "a.1", "b.1", "c.1", "a.2", "b.2", "c.2")

 a <-matrix (c(1,2,3,4, 5, 6, 7), 2,7)

 colnames (a)<-varnames

 df<-as.data.frame (a)


    ID  a.1  b.1  c.1  a.2  b.2  c.2
  1  1    3    5    7    2    4    6
  2  2    4    6    1    3    5    7

 res <- lapply(dup.temp[,1],function(i) {
 breaks <- c(-Inf,quantile(a[,paste(i,1,sep=".")], na.rm=T),Inf)
 cut(a[,paste(i,2,sep=".")],breaks)
 })

 res
[[1]]
[1] (-Inf,3] (-Inf,3]
Levels: (-Inf,3] (3,3.25] (3.25,3.5] (3.5,3.75] (3.75,4] (4, Inf]

[[2]]
[1] (-Inf,5] (-Inf,5]
Levels: (-Inf,5] (5,5.25] (5.25,5.5] (5.5,5.75] (5.75,6] (6, Inf]

[[3]]
[1] (5.5,7] (5.5,7]
Levels: (-Inf,1] (1,2.5] (2.5,4] (4,5.5] (5.5,7] (7, Inf]

What it the reason for this error? How can it be fixed? Thank you.

Didzis Elferts · Accepted Answer

You get this error because quantile values in your data for columns b.1, a.2 and b.2 are the same for some levels, so they can't be directly used as breaks values in function cut().

apply(a,2,quantile,na.rm=T)
       ID      a.1    b.1   a.2      b.2
0%   1.00  37.5000  59.38  75.0  59.3800
25%  2.25  42.5000 100.00  87.5  91.6675
50%  3.50  58.3350 100.00 100.0 100.0000
75%  4.75  91.6675 100.00 100.0 100.0000
100% 6.00 100.0000 100.00 100.0 100.0000

One way to solve this problem would be to put quantile() inside unique() function - so you will remove all quantile values that are not unique. This of course will make less breaking points if quantiles are not unique.

res <- lapply(dup.temp[,1],function(i) {
  breaks <- c(-Inf,unique(quantile(a[,paste(i,1,sep=".")], na.rm=T)),Inf)
  cut(a[,paste(i,2,sep=".")],breaks)
})

[[1]]
[1] <NA>        (91.7,100]  (58.3,91.7] <NA>        <NA>        (91.7,100] 
Levels: (-Inf,37.5] (37.5,42.5] (42.5,58.3] (58.3,91.7] (91.7,100] (100, Inf]

[[2]]
[1] (59.4,100]  (59.4,100]  (59.4,100]  (-Inf,59.4] (59.4,100]  (59.4,100] 
Levels: (-Inf,59.4] (59.4,100] (100, Inf]

eddi · Answer

If you'd rather keep the number of quantiles, another option is to just add a little bit of jitter, e.g.

breaks = c(-Inf,quantile(a[,paste(i,1,sep=".")], na.rm=T),Inf)
breaks = breaks + seq_along(breaks) * .Machine$double.eps

Matthew · Answer

This comes from the fact that your breaks are not unique. Instead of cut, you should use .bincode, which accepts a non unique vector of breaks.

x <- c(0, 0.01, 0.5, 0.99, 1)
breaks <- c(0, 0, 1, 1)
.bincode(x, breaks)

Dimitar Nentchev · Answer

If you actually mean the 10% or 25% portions of your population when you say decile, quartile etc. and not the actual numeric values of the decile/quartile buckets, you can rank your values first, and apply the quantile function on the ranks:

a <- c(1,1,1,2,3,4,5,6,7,7,7,7,99,0.5,100,54,3,100,100,100,11,11,12,11,0)
a_ranks <- rank(a, ties.method = "first")
decile <- cut(a_ranks, quantile(a_ranks, probs=0:10/10), include.lowest=TRUE, labels=FALSE)

Cut() error - 'breaks' are not unique

Tags:

r

DSSS

4 Answers

Didzis Elferts

eddi

Matthew

Dimitar Nentchev

Recent Activity

Donate For Us

Cut() error - 'breaks' are not unique

Tags:

r

DSSS

4 Answers

Didzis Elferts

eddi

Matthew

Dimitar Nentchev

Related questions

Recent Activity

Donate For Us