x<- c('a','v','c','a','d','e','g','f','h','y','u','r','s','w','s','d','g','j','u','r','s','s','s','v','b','g','e','w','s','d','g','h','j','i','t','e','w','w','q','q','d','v','b','m','m','k','l','u','p','o','r','t','n','e','w','w','j','f','c','g','h','t','r','d','e','w','w','w','z','f','g','f','h','h','y','r','f','f','l')
y <- sample(1:40, 79, replace=T)
y 1 38 18 19 19 37 38 26 4 32 23 11 24 36 15 22 19 6 24 13 36 2 26 35 39 8 33 20 19 23 28 5 17 40 26 18 21 [37] 35 23 27 12 3 33 16 32 11 19 4 5 8 19 5 19 33 33 33 13 12 32 21 4 14 8 28 34 33 22 34 19 39 23 6 8 [73] 37 17 21 16 38 15 36
I have two variables 'x' and 'y' . There is more than one instance of an observation in 'x' . There are values in y corresponding to every observation in 'x'
I would like to achieve grouping and also partitioning of y values into intervals .
To put it in a different way , how many times a letter occured would be divided into intervals specified based on value assigned to that letter in each of its occurance.
example :-
could not represent the table properly as i could not find a better way to type it here.
I hope it is clear. I shall try to restate it if needed. I would appreciate any help in this regard.
group_by() function along with n() is used to count the number of occurrences of the group in R. group_by() function takes “State” and “Name” column as argument and groups by these two columns and summarise() uses n() function to find count of a sales.
Count the Relative Frequency of Factor Levels using dplyr Using the n() function we got the number of observations of each value.
count() lets you quickly count the unique values of one or more variables: df %>% count(a, b) is roughly equivalent to df %>% group_by(a, b) %>% summarise(n = n()) .
tabulate() function in R Language is used to count the frequency of occurrence of a element in the vector. This function checks for each element in the vector and returns the number of times it occurs in the vector. It will create a vector of the length of the maximum element present in the vector.
Using dplyr
library(dplyr)
library(tidyr)
res <- tally(group_by(df, x, y=cut(y, breaks=seq(0,40, by=10)))) %>%
ungroup() %>%
spread(y,n, fill=0)
Or using data.table
library(data.table)
res1 <- dcast.data.table(setDT(df)[,list(.N),
by=list(x, y1=cut(y, breaks=seq(0,40, by=10)))],
x~y1, value.var="N", fill=0L)
all.equal(as.data.frame(res), as.data.frame(res1))
#[1] TRUE
Note: There is a label
argument in cut
so if you want to have the column
headings to be freq0-10
, etc
tally(group_by(df, x, y=cut(y,breaks=seq(0,40, by=10),
labels=paste0("freq", c("0-10", "10-20", "20-30", "30-40"))))) %>%
ungroup() %>%
spread(y,n, fill=0) %>%
head(2)
# x freq0-10 freq10-20 freq20-30 freq30-40
#1 a 0 1 1 0
#2 b 1 1 0 0
df <- structure(list(x = structure(c(1L, 22L, 3L, 1L, 4L, 5L, 7L, 6L,
8L, 24L, 21L, 18L, 19L, 23L, 19L, 4L, 7L, 10L, 21L, 18L, 19L,
19L, 19L, 22L, 2L, 7L, 5L, 23L, 19L, 4L, 7L, 8L, 10L, 9L, 20L,
5L, 23L, 23L, 17L, 17L, 4L, 22L, 2L, 13L, 13L, 11L, 12L, 21L,
16L, 15L, 18L, 20L, 14L, 5L, 23L, 23L, 10L, 6L, 3L, 7L, 8L, 20L,
18L, 4L, 5L, 23L, 23L, 23L, 25L, 6L, 7L, 6L, 8L, 8L, 24L, 18L,
6L, 6L, 12L), .Label = c("a", "b", "c", "d", "e", "f", "g", "h",
"i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u",
"v", "w", "y", "z"), class = "factor"), y = c(12L, 9L, 29L, 21L,
27L, 37L, 12L, 31L, 33L, 11L, 25L, 15L, 27L, 27L, 13L, 37L, 8L,
2L, 21L, 6L, 4L, 23L, 30L, 6L, 9L, 28L, 4L, 24L, 26L, 2L, 13L,
10L, 15L, 6L, 38L, 9L, 30L, 26L, 28L, 39L, 19L, 16L, 11L, 9L,
2L, 4L, 16L, 15L, 11L, 14L, 19L, 35L, 19L, 29L, 22L, 40L, 19L,
12L, 7L, 6L, 20L, 10L, 12L, 6L, 30L, 13L, 38L, 39L, 30L, 20L,
6L, 9L, 1L, 40L, 26L, 14L, 23L, 33L, 2L)), .Names = c("x", "y"
), row.names = c(NA, -79L), class = "data.frame")
Following Ananda Mahto's suggestion, here is an implementation using by
, cut
, & table
.
x = c('a','v','c','a','d','e','g','f','h','y','u','r','s','w','s','d','g','j',
'u','r','s','s','s','v','b','g','e','w','s','d','g','h','j','i','t','e',
'w','w','q','q','d','v','b','m','m','k','l','u','p','o','r','t','n','e',
'w','w','j','f','c','g','h','t','r','d','e','w','w','w','z','f','g','f',
'h','h','y','r','f','f','l')
y = sample(1:40, 79, replace = TRUE)
dfX = data.frame(x, y)
t(sapply(
by(
dfX$y, list(dfX$x), cut, breaks = c(0, 10, 20, 30, 40)),
table)
)
Here is the output:
> t(sapply(by(dfX$y, list(dfX$x), cut, breaks = c(0, 10, 20, 30, 40)), table))
(0,10] (10,20] (20,30] (30,40]
a 0 0 0 2
b 0 0 2 0
c 0 1 0 1
d 0 2 2 1
e 2 1 1 1
f 0 4 1 1
g 3 0 1 2
h 2 0 2 1
i 0 0 0 1
j 1 2 0 0
k 1 0 0 0
l 0 1 1 0
m 0 1 0 1
n 0 0 0 1
o 0 1 0 0
p 1 0 0 0
q 0 1 1 0
r 2 1 0 2
s 0 2 0 4
t 1 1 0 1
u 1 0 1 1
v 2 0 0 1
w 6 0 3 0
y 0 1 0 1
z 1 0 0 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With