I am processing my data using R and for that purpose, I have to format my database.
data <- database %>%
group_by(cat_a, cat_b) %>%
mutate(
lengths = cut(length, breaks = seq(0, (max(length)+50), by = 50)),
heights = cut(height, breaks = seq(0, (max(height)+1), by = 1), dig.lab=5)
)
At this point, when I check the values calculated by cut()
unique(data$heights)
[1] (0,1] (1,2] (2,3] (3,4] (4,5] (5,6] (6,7] (7,8] (8,9] (9,10] (10,11] (11,12] (12,13] <NA>
Levels: (0,1] (1,2] (2,3] (3,4] (4,5] (5,6] (6,7] (7,8] (8,9] (9,10] (10,11] (11,12] (12,13]
To better understand my problem max(height) returns 13.3. But, if you see the Levels the last one is (12,13]. This makes me believe that it is the reason to have a <NA> at the end at the first line of the result [1].
So, I tried to fix this by setting the breaks in cut() by +1 (see: (max(height)+1). But, not just that I don't get a new category, I also still have the NA.
Here I have to add, that omitting the NAs is not the solution, since I believe those are the values that didn't end up in a category. Basically values like 13.3.
Therefore, my question is how to fix this? How can I tell cut() to create that one extra category? I know that there is something like include.lower=TRUE, so I am looking the opposite, how to include the highest? Maybe my observation is wrong, so I am looking forward to every idea
UPDATE
As suggested in the comments:
heights = cut(height, breaks = c(-Inf,seq(0, (max(height)+1), by = 1), Inf), dig.lab=5)
We can add -Inf, Inf in breaks to remove the NA
cut(..., breaks = c(-Inf, seq(0, max(frame_size), by = 1), Inf))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With