Is it possible to use the R cut() function with median value as label instead of the cut "bin"?
Here is my current code:
> hists <- lapply(data, cut, 100)
> table <- lapply(hists, table)
> head(table$V2)
(0.442,0.892] (0.892,1.3] (1.3,1.71] (1.71,2.12] (2.12,2.53]
1 4 5 7 17
What I want is:
> head(table$V2)
0.667 1.096 1.505 1.915 2.325
1 4 5 7 17
I have tried something like:
hists <- lapply(data, cut, 100, labels=(max(x)-min(x))/100)
But have no idea how to specify the portion of the data frame that lapply is using (as each of the vectors have different min and max values). Is there an easier way of doing this?
Your attempt is not far off.
The key is that inside lapply you can define a custom function. Use this feature to create custom labels for every data frame in your list.
You want the median of your quintiles as the labels. You can achieve this by using the quantile function in combination with a sequence like 0.1, 0.3, 0.5, 0.7, 0.9 which are the medians of your quintiles:
quants <- seq(0.1, 1, by = 0.2)
hists <- lapply(data, function(x) cut(x, 5, labels=quantile(x, quants)))
Note: If you want 100 breaks instead of 5, just changed quants to seq(0.005, 1, by = 0.01) and change the 5 in cut() to 100
PS: never use data as a name in R, because the name data is actually already reserved for other built-in things in R. Rather use df or so.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With