Grouping data into ranges in R

Tags:

grouping

Suppose I have a data frame in R that has names of students in one column and their marks in another column. These marks range from 20 to 100.

Click to copy

> mydata  
id  name   marks gender  
1   a1    56     female  
2   a2    37      male

I want to divide the student into groups, based on the criteria of obtained marks, so that difference between marks in each group should be more than 10. I tried to use the function table, which gives the number of students in each range from say 20-30, 30-40, but I want it to pick those students that have marks in a given range and put all their information together in a group. Any help is appreciated.

1000

asked Sep 07 '12 09:09

Maddy

2 Answers

I am not sure what you mean with "put all their information together in a group", but here is a way to obtain a list with dataframes split up of your original data frame where each element is a data frame of the students within a mark range of 10:

Click to copy

mydata <- data.frame(
  id = 1:100,
  name = paste0("a",1:100),
  marks = sample(20:100,100,TRUE),
  gender = sample(c("female","male"),100,TRUE))

split(mydata,cut(mydata$marks,seq(20,100,by=10)))

154

answered Oct 02 '22 11:10

Sacha Epskamp

I think that @Sacha's answer should suffice for what you need to do, even if you have more than one set.

You haven't explicitly said how you want to "group" the data in your original post, and in your comment, where you've added a second dataset, you haven't explicitly said whether you plan to "merge" these first (rbind would suffice, as recommended in the comment).

So, with that, here are several options, each with different levels of detail or utility in the output. Hopefully one of them suits your needs.

First, here's some sample data.

Click to copy

# Two data.frames (myData1, and myData2)
set.seed(1)
myData1 <- data.frame(id = 1:20, 
                      name = paste("a", 1:20, sep = ""),
                      marks = sample(20:100, 20, replace = TRUE),
                      gender = sample(c("F", "M"), 20, replace = TRUE))
myData2 <- data.frame(id = 1:17,
                      name = paste("b", 1:17, sep = ""),
                      marks = sample(30:100, 17, replace = TRUE),
                      gender = sample(c("F", "M"), 17, replace = TRUE))

Second, different options for "grouping".

Option 1: Return (in a list) the values from myData1 and myData2 which match a given condition. For this example, you'll end up with a list of two data.frames.

Click to copy
```
lapply(list(myData1 = myData1, myData2 = myData2), 
       function(x) x[x$marks >= 30 & x$marks <= 50, ])
```
Option 2: Return (in a list) each dataset split into two, one for FALSE (doesn't match the stated condition) and one for TRUE (does match the stated condition). In other words, creates four groups. For this example, you'll end up with a nested list with two list items, each with two data.frames.

Click to copy
```
lapply(list(myData1 = myData1, myData2 = myData2), 
       function(x) split(x, x$marks >= 30 & x$marks <= 50))
```
Option 3: More flexible than the first. This is essentially @Sacha's example extended to a list. You can set your breaks wherever you would like, making this, in my mind, a really convenient option. For this example, you'll end up with a nested list with two list items, each with multiple data.frames.

Click to copy
```
lapply(list(myData1 = myData1, myData2 = myData2),
       function(x) split(x, cut(x$marks, 
                                breaks = c(0, 30, 50, 75, 100), 
                                include.lowest = TRUE)))
```

Option 4: Combine the data first and use the grouping method described in Option 1. For this example, you will end up with a single data.frame containing only values which match the given condition.

Click to copy

# Combine the data. Assumes all the rownames are the same in both sets
myDataALL <- rbind(myData1, myData2)
# Extract just the group of scores you're interested in
myDataALL[myDataALL$marks >= 30 & myDataALL$marks <= 50, ]

Option 5: Using the combined data, split the data into two groups: one group which matches the stated condition, one which doesn't. For this example, you will end up with a list with two data.frames.

Click to copy
```
split(myDataALL, myDataALL$marks >= 30 & myDataALL$marks <= 50)
```

I hope one of these options serves your needs!

answered Oct 02 '22 10:10

A5C1D2H2I1M1N2O1R2T1

Related questions
                            
                                Separate a shopping list into multiple columns
                            
                                How to create a geom line plot with single geom point at the end with legend
                            
                                Combine: rowwise(), mutate(), across(), for multiple functions
                            
                                Delete duplicates between groups in R
                            
                                Iterate sequentially over two lists in R
                            
                                How can I trust a library in R?
                            
                                Shift with dynamic n (number of position lead / lag by)
                            
                                Recommended language for multithreaded data work
                            
                                Why does the ggplot legend show the "colour" parameter?
                            
                                convolution in R
                            
                                Is there an efficient way to parallelize mapply?
                            
                                How does "gsub" handle spaces?
                            
                                Aggregating, restructuring hourly time series data in R
                            
                                GenomicFeatures Package Installation Trouble
                            
                                include images programmatically in .md document from within R chunk using knitr
                            
                                Convert lat/lon to zipcode / neighborhood name
                            
                                Controlling placement of empty lattice panels
                            
                                R: aggregate with column-specific function
                            
                                Character "|" in strsplit function (vertical bar / pipe)
                            
                                Split strings at the first colon

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Grouping data into ranges in R

Tags:

r

grouping

Maddy

People also ask

2 Answers

Sacha Epskamp

A5C1D2H2I1M1N2O1R2T1

Recent Activity

Donate For Us