Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R cut function with median as labels rather than bounds

Tags:

r

lapply

Is it possible to use the R cut() function with median value as label instead of the cut "bin"?

Here is my current code:

> hists <- lapply(data, cut, 100)
> table <- lapply(hists, table)
> head(table$V2)

(0.442,0.892]   (0.892,1.3]    (1.3,1.71]   (1.71,2.12]   (2.12,2.53] 
            1             4             5             7            17 

What I want is:

 > head(table$V2)

0.667   1.096    1.505   1.915   2.325 
   1       4       5       7       17 

I have tried something like:

hists <- lapply(data, cut, 100, labels=(max(x)-min(x))/100)

But have no idea how to specify the portion of the data frame that lapply is using (as each of the vectors have different min and max values). Is there an easier way of doing this?

like image 615
Adam Barnett Avatar asked Dec 31 '25 18:12

Adam Barnett


1 Answers

Your attempt is not far off.

The key is that inside lapply you can define a custom function. Use this feature to create custom labels for every data frame in your list.

You want the median of your quintiles as the labels. You can achieve this by using the quantile function in combination with a sequence like 0.1, 0.3, 0.5, 0.7, 0.9 which are the medians of your quintiles:

quants <- seq(0.1, 1, by = 0.2)
hists  <- lapply(data, function(x) cut(x, 5, labels=quantile(x, quants)))

Note: If you want 100 breaks instead of 5, just changed quants to seq(0.005, 1, by = 0.01) and change the 5 in cut() to 100

PS: never use data as a name in R, because the name data is actually already reserved for other built-in things in R. Rather use df or so.

like image 132
KenHBS Avatar answered Jan 03 '26 11:01

KenHBS



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!