I want to separate my data set into two subsets, where the one half contains all values below the median and the other half contains values above the median.
Problem: my data set has multiple observations with the same value as the median. Therefore,
v <- c(1,2,3,3,3,3,3,4)
med <- median(v)
upper <- v[which(v >= med)]
lower <- v[which(v <= med)]
doesn't work because the values equal to the median will appear in both sets and be overrepresented.
My expected output is
lower: 1,2,3,3
upper: 3,3,3,4
How can I split my dataframe by the median in R?
Based on your requirement, we just need to split the sorted vector in half. However, we need to account for cases where we have an odd number of elements, and so we use round(length(v))
to get the nearest integer element for odd length vectors:
v <- sort(v)
lower <- v[1:round(length(v)/2)]
upper <- v[round((length(v)/2)+1):length(v)]
lower
[1] 1 2 3 3
upper
[1] 3 3 3 4
This solution is for data frames.
df <- df[order(df$var),]
med <- median(df$var)
lower <- df[1:round(nrow(df)/2),]
upper <- df[round((nrow(df)/2)+1):nrow(df),]
Mako212 shows the method works. See his/her post.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With