Splitting a data frame by median

Question

I want to separate my data set into two subsets, where the one half contains all values below the median and the other half contains values above the median.

Problem: my data set has multiple observations with the same value as the median. Therefore,

v <- c(1,2,3,3,3,3,3,4)
med <- median(v)
upper <- v[which(v >= med)]
lower <- v[which(v <= med)]

doesn't work because the values equal to the median will appear in both sets and be overrepresented.

My expected output is

lower: 1,2,3,3
upper: 3,3,3,4

How can I split my dataframe by the median in R?

Mako212 · Accepted Answer

Based on your requirement, we just need to split the sorted vector in half. However, we need to account for cases where we have an odd number of elements, and so we use round(length(v)) to get the nearest integer element for odd length vectors:

v <- sort(v)
lower <- v[1:round(length(v)/2)] 
upper <- v[round((length(v)/2)+1):length(v)] 

lower
[1] 1 2 3 3
upper
[1] 3 3 3 4

Stan Shunpike · Answer

This solution is for data frames.

df <- df[order(df$var),] 
med <- median(df$var) 
lower <- df[1:round(nrow(df)/2),] 
upper <- df[round((nrow(df)/2)+1):nrow(df),]

Mako212 shows the method works. See his/her post.

Splitting a data frame by median

Tags:

r

Stan Shunpike

2 Answers

Mako212

Stan Shunpike

Recent Activity

Donate For Us

Splitting a data frame by median

Tags:

r

Stan Shunpike

2 Answers

Mako212

Stan Shunpike

Related questions

Recent Activity

Donate For Us