Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting a data frame by median

Tags:

r

I want to separate my data set into two subsets, where the one half contains all values below the median and the other half contains values above the median.

Problem: my data set has multiple observations with the same value as the median. Therefore,

v <- c(1,2,3,3,3,3,3,4)
med <- median(v)
upper <- v[which(v >= med)]
lower <- v[which(v <= med)]

doesn't work because the values equal to the median will appear in both sets and be overrepresented.

My expected output is

lower: 1,2,3,3
upper: 3,3,3,4

How can I split my dataframe by the median in R?

like image 634
Stan Shunpike Avatar asked Jan 29 '23 21:01

Stan Shunpike


2 Answers

Based on your requirement, we just need to split the sorted vector in half. However, we need to account for cases where we have an odd number of elements, and so we use round(length(v)) to get the nearest integer element for odd length vectors:

v <- sort(v)
lower <- v[1:round(length(v)/2)] 
upper <- v[round((length(v)/2)+1):length(v)] 

lower
[1] 1 2 3 3
upper
[1] 3 3 3 4
like image 131
Mako212 Avatar answered Feb 01 '23 07:02

Mako212


This solution is for data frames.

df <- df[order(df$var),] 
med <- median(df$var) 
lower <- df[1:round(nrow(df)/2),] 
upper <- df[round((nrow(df)/2)+1):nrow(df),]

Mako212 shows the method works. See his/her post.

like image 39
Stan Shunpike Avatar answered Feb 01 '23 07:02

Stan Shunpike