<blockquote> x<- c('a','v','c','a','d','e','g','f','h','y','u','r','s','w','s','d','g','j','u','r','s','s','s','v','b','g','e','w','s','d','g','h','j','i','t','e','w','w','q','q','d','v','b','m','m','k','l','u','p','o','r','t','n','e','w','w','j','f','c','g','h','t','r','d','e','w','w','w','z','f','g','f','h','h','y','r','f','f','l') y <- sample(1:40, 79, replace=T) y 1 38 18 19 19 37 38 26 4 32 23 11 24 36 15 22 19 6 24 13 36 2 26 35 39 8 33 20 19 23 28 5 17 40 26 18 21 [37] 35 23 27 12 3 33 16 32 11 19 4 5 8 19 5 19 33 33 33 13 12 32 21 4 14 8 28 34 33 22 34 19 39 23 6 8 [73] 37 17 21 16 38 15 36 </blockquote> <img src="https://i.stack.imgur.com/UeuIv.jpg" alt="enter image description here"> I have two variables 'x' and 'y' . There is more than one instance of an observation in 'x' . There are values in y corresponding to every observation in 'x' I would like to achieve grouping and also partitioning of y values into intervals . To put it in a different way , how many times a letter occured would be divided into intervals specified based on value assigned to that letter in each of its occurance. example :- <img src="https://i.stack.imgur.com/P6yLi.jpg" alt="enter image description here"> could not represent the table properly as i could not find a better way to type it here. I hope it is clear. I shall try to restate it if needed. I would appreciate any help in this regard.

Using <code>dplyr</code> <pre class="prettyprint"><code>library(dplyr) library(tidyr) res <- tally(group_by(df, x, y=cut(y, breaks=seq(0,40, by=10)))) %>% ungroup() %>% spread(y,n, fill=0) </code></pre> Or using <code>data.table</code> <pre class="prettyprint"><code>library(data.table) res1 <- dcast.data.table(setDT(df)[,list(.N), by=list(x, y1=cut(y, breaks=seq(0,40, by=10)))], x~y1, value.var="N", fill=0L) all.equal(as.data.frame(res), as.data.frame(res1)) #[1] TRUE </code></pre> Note: There is a <code>label</code> argument in <code>cut</code> so if you want to have the <code>column</code> headings to be <code>freq0-10</code>, etc <pre class="prettyprint"><code> tally(group_by(df, x, y=cut(y,breaks=seq(0,40, by=10), labels=paste0("freq", c("0-10", "10-20", "20-30", "30-40"))))) %>% ungroup() %>% spread(y,n, fill=0) %>% head(2) # x freq0-10 freq10-20 freq20-30 freq30-40 #1 a 0 1 1 0 #2 b 1 1 0 0 </code></pre> <h3>data</h3> <pre class="prettyprint"><code> df <- structure(list(x = structure(c(1L, 22L, 3L, 1L, 4L, 5L, 7L, 6L, 8L, 24L, 21L, 18L, 19L, 23L, 19L, 4L, 7L, 10L, 21L, 18L, 19L, 19L, 19L, 22L, 2L, 7L, 5L, 23L, 19L, 4L, 7L, 8L, 10L, 9L, 20L, 5L, 23L, 23L, 17L, 17L, 4L, 22L, 2L, 13L, 13L, 11L, 12L, 21L, 16L, 15L, 18L, 20L, 14L, 5L, 23L, 23L, 10L, 6L, 3L, 7L, 8L, 20L, 18L, 4L, 5L, 23L, 23L, 23L, 25L, 6L, 7L, 6L, 8L, 8L, 24L, 18L, 6L, 6L, 12L), .Label = c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "y", "z"), class = "factor"), y = c(12L, 9L, 29L, 21L, 27L, 37L, 12L, 31L, 33L, 11L, 25L, 15L, 27L, 27L, 13L, 37L, 8L, 2L, 21L, 6L, 4L, 23L, 30L, 6L, 9L, 28L, 4L, 24L, 26L, 2L, 13L, 10L, 15L, 6L, 38L, 9L, 30L, 26L, 28L, 39L, 19L, 16L, 11L, 9L, 2L, 4L, 16L, 15L, 11L, 14L, 19L, 35L, 19L, 29L, 22L, 40L, 19L, 12L, 7L, 6L, 20L, 10L, 12L, 6L, 30L, 13L, 38L, 39L, 30L, 20L, 6L, 9L, 1L, 40L, 26L, 14L, 23L, 33L, 2L)), .Names = c("x", "y" ), row.names = c(NA, -79L), class = "data.frame") </code></pre>

How to use dplyr to group elements in x ,count frequency of x for an interval of y?

Q: How do I count and group by in R?

group_by() function along with n() is used to count the number of occurrences of the group in R. group_by() function takes “State” and “Name” column as argument and groups by these two columns and summarise() uses n() function to find count of a sales.

Q: How does Dplyr calculate frequency?

Count the Relative Frequency of Factor Levels using dplyr Using the n() function we got the number of observations of each value.

Q: How do I count the number of observations in a group in R?

count() lets you quickly count the unique values of one or more variables: df %>% count(a, b) is roughly equivalent to df %>% group_by(a, b) %>% summarise(n = n()) .

Tags:

r

dplyr

x<- c('a','v','c','a','d','e','g','f','h','y','u','r','s','w','s','d','g','j','u','r','s','s','s','v','b','g','e','w','s','d','g','h','j','i','t','e','w','w','q','q','d','v','b','m','m','k','l','u','p','o','r','t','n','e','w','w','j','f','c','g','h','t','r','d','e','w','w','w','z','f','g','f','h','h','y','r','f','f','l')

y <- sample(1:40, 79, replace=T)

y 1 38 18 19 19 37 38 26 4 32 23 11 24 36 15 22 19 6 24 13 36 2 26 35 39 8 33 20 19 23 28 5 17 40 26 18 21 [37] 35 23 27 12 3 33 16 32 11 19 4 5 8 19 5 19 33 33 33 13 12 32 21 4 14 8 28 34 33 22 34 19 39 23 6 8 [73] 37 17 21 16 38 15 36

enter image description here

I have two variables 'x' and 'y' . There is more than one instance of an observation in 'x' . There are values in y corresponding to every observation in 'x'

I would like to achieve grouping and also partitioning of y values into intervals .

To put it in a different way , how many times a letter occured would be divided into intervals specified based on value assigned to that letter in each of its occurance.

example :-

enter image description here

could not represent the table properly as i could not find a better way to type it here.

I hope it is clear. I shall try to restate it if needed. I would appreciate any help in this regard.

384

asked Nov 01 '14 10:11

user3563667

2 Answers

Using dplyr

library(dplyr)
library(tidyr)

res <- tally(group_by(df, x, y=cut(y, breaks=seq(0,40, by=10)))) %>% 
                                                        ungroup() %>%
                                                         spread(y,n, fill=0)

Or using data.table

library(data.table)
res1 <- dcast.data.table(setDT(df)[,list(.N), 
           by=list(x, y1=cut(y, breaks=seq(0,40, by=10)))],
                            x~y1, value.var="N", fill=0L)

all.equal(as.data.frame(res), as.data.frame(res1))
#[1] TRUE

Note: There is a label argument in cut so if you want to have the column headings to be freq0-10, etc

 tally(group_by(df, x, y=cut(y,breaks=seq(0,40, by=10),
      labels=paste0("freq", c("0-10", "10-20", "20-30", "30-40")))))  %>%
                                                            ungroup() %>%
                                                            spread(y,n, fill=0) %>%
                                                            head(2)

  #   x freq0-10 freq10-20 freq20-30 freq30-40
  #1 a        0         1         1         0
  #2 b        1         1         0         0

data

 df <-  structure(list(x = structure(c(1L, 22L, 3L, 1L, 4L, 5L, 7L, 6L, 
 8L, 24L, 21L, 18L, 19L, 23L, 19L, 4L, 7L, 10L, 21L, 18L, 19L, 
 19L, 19L, 22L, 2L, 7L, 5L, 23L, 19L, 4L, 7L, 8L, 10L, 9L, 20L, 
 5L, 23L, 23L, 17L, 17L, 4L, 22L, 2L, 13L, 13L, 11L, 12L, 21L, 
 16L, 15L, 18L, 20L, 14L, 5L, 23L, 23L, 10L, 6L, 3L, 7L, 8L, 20L, 
 18L, 4L, 5L, 23L, 23L, 23L, 25L, 6L, 7L, 6L, 8L, 8L, 24L, 18L, 
 6L, 6L, 12L), .Label = c("a", "b", "c", "d", "e", "f", "g", "h", 
 "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", 
 "v", "w", "y", "z"), class = "factor"), y = c(12L, 9L, 29L, 21L, 
 27L, 37L, 12L, 31L, 33L, 11L, 25L, 15L, 27L, 27L, 13L, 37L, 8L, 
 2L, 21L, 6L, 4L, 23L, 30L, 6L, 9L, 28L, 4L, 24L, 26L, 2L, 13L, 
 10L, 15L, 6L, 38L, 9L, 30L, 26L, 28L, 39L, 19L, 16L, 11L, 9L, 
 2L, 4L, 16L, 15L, 11L, 14L, 19L, 35L, 19L, 29L, 22L, 40L, 19L, 
 12L, 7L, 6L, 20L, 10L, 12L, 6L, 30L, 13L, 38L, 39L, 30L, 20L, 
 6L, 9L, 1L, 40L, 26L, 14L, 23L, 33L, 2L)), .Names = c("x", "y"
 ), row.names = c(NA, -79L), class = "data.frame")

answered Sep 22 '22 14:09

akrun

Following Ananda Mahto's suggestion, here is an implementation using by, cut, & table.

x = c('a','v','c','a','d','e','g','f','h','y','u','r','s','w','s','d','g','j',
      'u','r','s','s','s','v','b','g','e','w','s','d','g','h','j','i','t','e',
      'w','w','q','q','d','v','b','m','m','k','l','u','p','o','r','t','n','e',
      'w','w','j','f','c','g','h','t','r','d','e','w','w','w','z','f','g','f',
      'h','h','y','r','f','f','l')
y = sample(1:40, 79, replace = TRUE)
dfX = data.frame(x, y)

t(sapply(
  by(
    dfX$y, list(dfX$x), cut, breaks = c(0, 10, 20, 30, 40)),
  table)
  )

Here is the output:

> t(sapply(by(dfX$y, list(dfX$x), cut, breaks = c(0, 10, 20, 30, 40)), table))
  (0,10] (10,20] (20,30] (30,40]
a      0       0       0       2
b      0       0       2       0
c      0       1       0       1
d      0       2       2       1
e      2       1       1       1
f      0       4       1       1
g      3       0       1       2
h      2       0       2       1
i      0       0       0       1
j      1       2       0       0
k      1       0       0       0
l      0       1       1       0
m      0       1       0       1
n      0       0       0       1
o      0       1       0       0
p      1       0       0       0
q      0       1       1       0
r      2       1       0       2
s      0       2       0       4
t      1       1       0       1
u      1       0       1       1
v      2       0       0       1
w      6       0       3       0
y      0       1       0       1
z      1       0       0       0

answered Sep 22 '22 14:09

tchakravarty

Related questions
                            
                                Why can't I boxplot an xts directly?
                            
                                Converting XML to JSON using R
                            
                                Collapse vector to string of characters with respective numbers of consequtive occurences
                            
                                Create a function with whole columns as input and output
                            
                                Overlay violin plots ggplot2
                            
                                How can I read Mapinfo files in R
                            
                                Why is intersect(...) faster than data table join?
                            
                                How can I use merge to cbind two dataframes
                            
                                Retain and lag function in R as SAS
                            
                                Fast minimum distance (interval) between elements of 2 logical vectors (take 2)
                            
                                RODBC sqlQuery() returns varchar(255) when it should return varchar(MAX)
                            
                                How to calculate date based on week number in R
                            
                                How do I run a ldap query using R?
                            
                                Differences between character() and "" in R
                            
                                How to subtract first entry from last entry in grouped data
                            
                                Use a variable name with spaces inline in R markdown
                            
                                Format date-time as seasons in R?
                            
                                Any suggestions for how I can plot mixEM type data using ggplot2
                            
                                How can I type the +- symbol in R
                            
                                Double Click in R-shiny

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With