Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating column that lists distinct observations

I have a data frame of observations that looks like this (showing course numbers of college classes offered each term). The columns are very long and of varying lengths

  spring   summer   fall
   4a       5b       5c
   4a       9c       11b
   7c       5b       8a 
   ...      ...      ...

I want to reformat it to make it look like this. First, I want to create a column, "Course_Names", that shows all names of distinct course offerings possible. Then, I want to count the number of sections of each course offered each semester.

   Course_Names   spring   summer   fall
   4a             2        0        0
   5b             0        2        0
   5c             0        0        1
   7c             1        0        0
   8a             1        0        1
   9c             0        1        0
   11b            0        0        1        

Any advice or links to relevant posts would be very much appreciated! Thank you!

like image 300
Anna Jones Avatar asked Nov 28 '25 20:11

Anna Jones


1 Answers

In base R, an option would be to stack the data.frame into a two column dataset and use table

table(stack(df1))
#    ind
#values spring summer fall
#   11b      0      0    1
#   4a       2      0    0
#   5b       0      2    0
#   5c       0      0    1
#   7c       1      0    0
#   8a       0      0    1
#   9c       0      1    0

Or in tidyverse, we can reshape into 'long' format with pivot_longer, get the count and reshape into 'wide

library(dplyr)
library(tidyr)
df1 %>%
    pivot_longer(everything()) %>%
    count(name, Course_Names = value) %>%
    pivot_wider(names_from = name, values_from = n, values_fill = list(n = 0))
# A tibble: 7 x 4
#  Course_Names  fall spring summer
#  <chr>        <int>  <int>  <int>
#1 11b              1      0      0
#2 5c               1      0      0
#3 8a               1      0      0
#4 4a               0      2      0
#5 7c               0      1      0
#6 5b               0      0      2
#7 9c               0      0      1

data

df1 <- structure(list(spring = c("4a", "4a", "7c"), summer = c("5b", 
"9c", "5b"), fall = c("5c", "11b", "8a")), class = "data.frame", row.names = c(NA, 
-3L))
like image 73
akrun Avatar answered Dec 01 '25 11:12

akrun